With the escalating amount of data available online, recommender systems [P. Resnick and H. R. Varian, Recommender systems, Communications of the ACM, 40(3):56-58, 1997] became very popular, especially on web sites. As known in the art, recommender systems are systems that recommend items to users. Such systems have various applications such as helping users find web pages that interest them, recommending products to customers in e-commerce websites, recommending TV programs to users of interactive TV and displaying personalized advertisements. While there are many types of recommender systems ranging from manually predefined un-personalized recommendations to fully automatic general purpose recommendation engines, two dominating approaches have emerged—Collaborative Filtering (CF) and Content Based (CB) recommendations.
Both CF and CB are discussed by Montaner et al. [M. Montaner, B. Lpez, and J. L. De La Rosa. A taxonomy of recommender agents on the internet. Artificial Intelligence Review, 19:285-330, 2003]. In summary, the CF approach considers the recommended items only by a unique identifier and recommends items that were purchased together, while ignoring any attribute of the item. On the other hand, CB recommendations are generated based on an item profile (i.e., a set of attributes of an item) discarding purchase information.
Collaborative filtering stems from the assumption that people looking for recommendations often ask for the advice of friends. Since on the internet the population that can supply advice is very large, the part of the population which may be relevant for the current user must be identified.
CF methods identify similarity between users based on items they have rated, and recommend new items similar users have liked. CF processes vary by the method they use to identify similar users. Originally Nearest-Neighbor approaches, based on the Pearson Correlation, computing similarity between users directly over the database of user-item ratings were implemented. However, most modern systems avoid querying the database (of either user-item ratings of item or item descriptions) directly. Instead statistical models (e.g. Decision Tree, SVD matrix, Dependency Network) are adopted to allow scaling up to millions of users and items. Model-based approaches usually sacrifice some accuracy in favor of a rapid recommendation generation process [J. S. Breese et al., “Empirical analysis of predictive algorithms for collaborative filtering”, Uncertainty in Artificial Intelligence, 1998, pages 43-52]. Such an approach is better scaled to modern applications.
CF is advantageous in that it is independent of the specification of the item and can therefore provide recommendations for complex items which are very different, yet are often used together. On the other hand, one major drawback of this approach is the inability to create good recommendations for new users that have not yet rated many items, and for new items that were not rated by many users (this problem is known in the art as the “cold-start” problem).
As mentioned above, CB recommendations relate to the attributes of the items. The CB approach originates in the field of information filtering, where documents are searched according to some given analysis of their text. Items are hence defined by a set of features or attributes. Such systems define a user using preferences over this set of features, and obtain recommendations by matching between user profiles and item profiles looking for best matches. It should be understood that although in the art methods that learn preferred attributes from rated items (referred to as content-based) are sometimes separated from methods that ask the user to specify his preferences over item attributes (referred to as demographic filtering), herein, all methods that base their recommendations on item attribute preferences are related to as content-based recommendations.
CB systems can easily provide valid recommendations to new users, assuming that their profile is specified, even if they never used the system before. CB engines can provide recommendations for new items that were never rated before based on the item description and are therefore very useful in environments where new items are constantly added.
However, Content-based approaches rarely implement statistical models and usually match user profiles and item profiles directly. User and item profiles are very sensitive to profile definitions, i.e., which attributes are relevant and which attributes should be ignored. It is also difficult to create an initial profile of the user, by specifying the interests and preferences of that specific user. Users are usually reluctant to provide thorough description of the things they like and do not like. In some cases users are unaware of their preferences. For example, a user cannot know whether he likes an actor he did not see. In fact, the acquisition of user preferences is usually considered to be a bottleneck for practical use of these systems [Teixeira et al., “A method for speeding up user preferences acquisition in collaborative filtering systems”, SBIA Conference, pages 237-247, 2002].
A term used both in CF and CB methods is “stereotypes”, known also as “communities”. Stereotypes are a way to define an abstract user that has general properties similar to a set (community) of real users. As known to those familiar with the art, modeling users by stereotypes (or communities) is a well studied concept [E. Rich, User modeling via stereotypes, Cognitive Science, vol. 3, pages 329-342, 1998].
In CF systems stereotypes are described by a set of ratings over items, and user similarity can be identified by their affinity to various stereotypes. In CB systems, a stereotype is a set of preferences over item attributes, and users can belong to a single stereotype [E. Rich, User modeling via stereotypes, Cognitive Science, vol. 3, pages 329-342, 1998] or to multiple stereotypes [Orwant, User Model. User-Adapt. Interact., 4(2):107-130, 1995]. Recommendations are computed based on the stereotype and then normalized according to the user affinity to a stereotype.
Generally, in order to adapt and refine recommendations to changes in user tastes, most recommender systems, including CB and CF, rely on feedback from users. Feedback is usually in the form of a rating over an item that can be either numeric (on a scale of, e.g., 1 to 5) or binary (like/dislike).
As users are usually reluctant to rate items explicitly, some research has been focused on obtaining implicit ratings (e.g. [R. Schwab, “How to learn more about users from implicit observations”, UM 2001 Conference Proceedings, pages 286-288, 2001]), i.e., estimating the user ratings through his observable operations. For example, in web browsing, if the user scrolled down the article, or clicked on a link inside the article, then it can be assumed that the article was useful for him. If the user, however, only read the title and then went back to the former page, it can be assumed that the web page was not useful for him.
As mentioned above, although both CF and CB systems have several advantages, they suffer from drawbacks, as well. Furthermore, a more detailed study of both systems reveals that they are complementary to each other in many aspects. Therefore, there have been attempts to provide better recommendation systems, based on a combination of both approaches [R. Burke, “Hybrid recommender systems: Survey and experiments”, User Modeling and User-Adapted Interaction, 12(4):331-370, 2002].
Many hybrid approaches use two recommendation processes and combine their results in some manner, such as combining the results by their relevance, mixing the output of the two processes, switching from CB into CF once the cold-start phase is over, or using the output of one process as an input to the second process. However, such ad-hoc combinations are not optimal in their performance.
It would therefore be highly desirable to provide a recommendation system that overcomes the drawbacks of the existing systems. Such a system would be, at its nature a hybrid of CF and CB, rather than ad-hoc combination of the two.
It is the object of the present invention to provide a recommendation system that is, at its core, a hybrid of CF and CB.
It is a further object of the present invention to provide a recommendation system that implements a set of stereotype content-based profiles using an affinity vector of stereotypes as the user profile.
It is yet a further object of the present invention to provide a recommendation system that can be updated in regard to any new, relevant, data concerning the user.
Additional purposes and advantages of this invention will become apparent as the description proceeds.