This specification relates to identifying particular data objects for recommendation.
The interconnection of the world's computers, made possible by the Internet, makes the information in millions of data stores available at the user interface of a connected device. Along with this abundance of data comes the associated problem of locating information of interest. In order to present contextually pertinent information, computer systems often provide a list of recommended data objects or content items, for example, results identified from among some quantity of candidate data objects. The identification can use any number of identification algorithms and criteria. Search engines provide a solution to this information finding problem in cases where a user knows, at least in some respects, what she is seeking. A query that includes one or more keywords related to the information being sought is entered, and the system returns a set of results, identifying data objects from which the user can choose. Other systems can provide recommendations based on a given context. A context can be, for example, a state of a collection of user data that evidences user interests through records of past activity. For example, a news web site can recommend news articles based on user interests evidenced by past Internet browsing activity, or as another example, a shopping web site can recommend products to a user based on product pages that a user has previously chosen to view.
Often, the set of data from which a given system identifies results includes two or more highly similar data objects. That is, the result set includes multiple results that are so similar that their inclusion in the result set might be considered by a user to be redundant. This often follows from the identification methods used to select results. If a first data object identified from a set of a candidate data scores highly according to the identification method used, then a similar data object is likely to score highly as well, leading to both being included in the set of results. In some cases, where a data set has many similar data objects, similar result listings might make up a majority, or an entire set of results.
In some situations, such as, for example, a search for a certain location of a retail chain, repetitive similar listings can be desirable. In other cases, however, such repetition obscures more dissimilar results that might be of interest to the user; or where the number of results is limited, the repetitive results may push any dissimilar results out of the result set completely.