There are certain information needs underlying user browsing behaviors on a content rich network. A content recommendation system aims to satisfy user's information needs with high quality recommendations. However, a recommendation system is limited in the number of recommendations that can be presented to the user. There is therefore a tradeoff between relevance of the results and diversity. Marginal relevance is related to redundancy, e.g. even if a news article is highly relevant to user's interest, its information could be redundant to other suggestions and hence have little if any relevance. Redundant suggestions diminish a user's experience and impact the user's satisfaction with the recommendation system.
Conventionally, recommendation systems operate in two separate stages. In a first stage, documents are retrieved based on relevance criteria. In a second and separate stage, the retrieved documents are clustered into several dissimilar groups. Recommendation results are then selected from these distinct groups to diversify the presented information. There are several problems with such a two-stage approach. For example, the clustering must be performed on every retrieval set, which results in significant computational overhead on the online recommendation service. As another example, many factors affect the online clustering process and its output, e.g. the number of clusters, the cluster sizes, stopping criterion, etc., all of which affect the final information presented. As yet another problem, clustering-based diversification and relevance ranking is usually carried out without considering the inherent multitude of user information needs and interests.