Improving Result Diversity using PLSA at DIR2011

Monday, February 07, 2011

Katja and I presented a poster regarding our paper entitled "Improving Result Diversity using PLSA" at DIR 2011 on Friday.  The poster is below:

DIR 2011 Diversity with PLSA Poster

The abstract is included below. The full paper is available for download as well.

IA-SELECT is a recently developed algorithm for increasing the diversity of a search result set by reordering an original document list based on manually generated clusters. In this paper we extend this approach to create a diversification framework in which arbitrary clustering methods can be used, and where the influence of clusters can be balanced against the original rank of documents. We study whether clusters that are automatically generated using probabilistic latent semantic analysis (PLSA) can compete with manually created clusters, and investigate how balancing the influence of clusters and original document rank affects diversity scores. As there are currently few datasets for evaluating diversity, we develop a new dataset, which is released with this paper. Our results show that diversification using PLSA can improve diversity, but that there is a large gap in performance between automatically and manually created clusters.