The present invention relates generally to data processing systems, and more particularly, collaborative filtering and recommender systems.
Recommender systems predict the preferences of users based on attributes known about the user or a past history of preferences or consumption by the user. For example, a recommender system may predict that a user will like the movie "Titanic" because he previously indicated a liking for such other epic movies as "Lawrence of Arabia" or "Ben Hur".
It is important for recommender systems to consider quality as well as content of items being recommended. However, there exists no reliable computerized process for estimating the quality of the content of most items that are the subject of a recommender system, especially non-textual multimedia content such as movies and videos. For example, a computer program cannot tell if a remade version of the movie Casablanca is well done or not? Collaborative filtering technology allows recommender systems to provide recommendations based on both content and quality by making recommendations using quality and content judgments made by other users, not by automated computer analysis of content. By pooling together the preferences evinced by a community of users, collaborative filtering allows that user community to share the preferences anonymously in a large scale manner. A prediction of one user's preference for an item is computed by considering other people's preferences for that item, where those other people are chosen based on how similar their interests and expectations are to the user's. For example, a user might be given a high prediction for the movie "Sense and Sensibility," because other users, in the community who shared the user's taste in Jane Austen movies thought "Sense and Sensibility" was a good movie.
A recommender system determines its recommendations by examining previous user preference data. The preference data can be unary or numerically valued. Unary preference data is a set of customer-item pairs: a customer-item pair indicates that an event linking the customer to the item has occurred. No additional preference information is available to the recommender system about the a user-item event except that it happened. The non-existence of a customer-item pair (more generally known as a tuple) for a specific customer-item pair does not indicate a preference: it only indicates a lack of information. An example of unary customer data is purchase record data where a customer-item pair indicates that the customer has purchased the indicated item. Another example of unary data is contained in web page logs, where a customer-item pair indicates that the customer has visited a specific web page.
Binary and numerically valued preference data are generally in the form of a 3-tuples, where the three elements of the tuple are customer identifier, item identifier, and preference value. The preference value indicates, for example, the strength of the user's preference for the item or whether the user's preference is either for or against the item. To illustrate, where the preference is represented in binary form, a "0" may mean a preference against an item while a "1 " means a preference for the item. Where the preference is presented as numerically valued data, the data value may represent a one-dimensional axis of preference, with the midpoint indicating an ambivalent preference for the item, a low value indicating a strong dislike for the item, and a high value indicating a strong preference for the item.
Preference data may be presented to the recommender system in explicit or implicit form. Explicit preference data are preference values that a user has supplied directly, for example by filling out a survey. Implicit preference data consist of preference values that have been inferred by observing actions that the user has taken. It can be inferred that the user has some preference for the item that she has just bought, although the act of purchasing the item is not an explicit statement of preference per se. A user's preference for a web page may be inferred, for example, by measuring the amount of time that the user spends reading the web page, or the number of times the user returns to that page.
The inputs to a recommender system are typically preference values as described above. The outputs of the recommender system are predictions of preference values for items, particularly those for which the user has not already indicated a preference. Like the input values, the output preferences may be unary, binary, or numerically valued. A system that outputs unary recommendations predicts items that will be of interest to the user, but does not attempt to predict the strength of a user's preference for each item. Binary predictions indicate items that are likely to be of high preference to the user and items that are likely to be of low preference, but again cannot provide an estimate of preference strength. Numerically valued preferences indicate a preference for or against the item and also indicate the preference strength. Note that the domain of the preference input may be different from the domain of the output preference predictions. For example, the preference input may be unary, while the output preference predictions may be numerically valued.
While unary and binary preference values do not indicate the strength of the preference, some recommender systems may additionally rank the preference predictions being returned such that the highest rank predictions have the largest probability of being correct. Numerically valued items are implicitly ranked.
Existing recommender systems generate recommendations by selecting the highest-ranking positive preference values. However, this technique does not always provide a desirable effect. In many cases, if the recommender system has sufficient data to have high confidence that a recommendation will be good, then the recommendation will be obvious to the user. If the recommendation is obvious, then the recommender system has provided no value. For example, in the context of a system that recommends items to a user for purchase at a grocery store, the recommender system may determine that it is most likely that the user will be interested in purchasing milk, and therefore will recommend that the customer purchase milk. However, a large fraction of grocery store shoppers buy milk, even without being recommended to do so, and so, making a recommendation to purchase milk is obvious. Thus, making such a recommendation to the customer is not very helpful. In spite of being an accurate recommendation, it is not a useful, or valuable, recommendation, since it does not provide the customer with knowledge that he did not already have.
Therefore, there exists a problem with existing recommender systems that, although able to recommend items with high confidence level, often recommend items that are obvious to the user. Consequently, the value of the recommendation is low. There exists a need to overcome the problem of making low value recommendations.