This invention relates generally to automatic collaborative filtering systems for predicting a user""s level of interest in new information, and more particularly to a system and method of bootstrapping or cold starting a collaborative filtering system.
The amount of information that is available globally, via the World Wide Web or the Internet, or locally on some Intranets, is so large that managing such information is critical. One way of managing and distributing information is to use a collaborative filtering system to predict a user""s preference and use that information to distribute new information to the user.
Collaborative filtering, the sharing of knowledge through recommendations, is an important vehicle for distributing information. There are two distinct types of collaborative filtering mechanisms: those which enable active collaborative filtering by making it easier for people to share pointers to interesting documents and those which automate collaborative filtering by using statistical algorithms to make recommendations based on correlations between personal preferences.
Collaborative filtering systems are of particular value to suppliers of goods and services in that they can be used to enhance the distribution of their goods and services to customers. Automated collaborative filtering (ACF) is a general type of statistical algorithm that matches items (such as movies, books, music, news articles, etc.) to users by first matching users to each other. ACF uses statistical algorithms to make recommendations based on correlations between personal preferences. Recommendations usually consist of numerical ratings input manually by users, but can also be deduced from user behavior (e.g., time spent reading a document, actions such as printing, saving or deleting a document). The premise of such systems is that a user is going to prefer an item that is similar to other items chosen by the user and by other users.
U.S. Pat. No. 5,724,567 to Rose et al., entitled System for Directing Relevance Ranked Data Objects to Computer Users, describes a system for matching a user""s interests by comparing the content of an item with a user""s adaptive profile. Feedback is also available to enable the user to update his/her profile.
U.S. Pat. No. 5,704,017 to Heckerman et al., entitled Collaborative Filtering Utilizing a Belief Network, describes a form of adaptive ACF in which the system compares predictions with actual scores and performs adjustments to make the predictions come in line with the known scores.
However, automated collaborative filtering systems such as the above suffer from the cold-start problem: early users will receive inaccurate predictions until there is enough usage data for the algorithm to be able to learn their preferences. In prospective applications of ACF technology, such as knowledge management tools for organizations, consistent high quality service is key. Many existing current systems which employ ACF (MovieLens, Amazon.com, BarnesandNoble, etc.) either require users to rate a number of items before they will provide recommendations, use data from purchases, or provide initial predictions which are not personalized (e.g., use the average rating).
Knowledge Pump is a Xerox system which employs a push methodology of sharing knowledge where users are connected by a central knowledge repository with software tracking their interests and building up information that is sent to appropriate users. In Knowledge Pump the system is seeded with a skeletal social network of the intended users, a map of the organization""s domains of interest and a collection of recommended items. For example, user-provided lists of immediate contacts or friendsxe2x80x94people whose opinion the user tends to particularly valuexe2x80x94may be used. See Glance et al., xe2x80x9cKnowledge Pump: Supporting the Flow and Use of Knowledge,xe2x80x9d in Information Technology for Knowledge Management, Eds. U. Borghoff and R. Pareschi, New York: Springer-Verlag, pp. 35-45, 1998.
Note that even when ACF is feasible, it does not necessarily yield accurate predictions. The accuracy of the prediction depends on the number of items rated in common by pairs of users X and Y, the number of ratings available for the item and the number of other items each rater of that item has rated.
In many systems such cold-starting techniques are not always acceptable to users. Few users want to take the time to provide initial ratings and thus may lose interest in using the system. In some systems using xe2x80x9caverage dataxe2x80x9d may not be useful in providing recommendations. Other systems, especially new systems, may have no related data from which to extrapolate a user""s interests or no means of acquiring the seed data.
There is a need for a system and a method of bootstrapping an ACF system that provides accurate estimates beginning with initial operation of the system. There is also a need for a system and method of bootstrapping an ACF system that is easily updated as users continue to use the system and method. There is a need for a system and method of bootstrapping an ACF system that is particularly useful for Intranets.
A method of predicting a user""s rating of a new item in a collaborative filtering system, according to the invention, includes providing an initial set of correlation coefficients for the intended users. The users are members of a predetermined organization and the correlation coefficient for each pair of users is based on the organizational relationship between the users. Once the system is seeded with a set of correlation coefficients for the intended users, when a new item is presented, the system calculates a prediction for item. If other users in the system have rated the item, a predicted user rating is calculated. The predicted user rating calculation is the weighted average of all ratings for the item, using the correlation coefficients.
In an organizational setting, there are many kinds of prior organizational relationship information available concerning the population of users. One such predetermined organizational relationship includes the strength of ties between potential users. Examples of organizational data include a formal organization chart and social network maps built using interviews or deduced from observed (online and/or offline) interaction patterns. Such data is generally readily available in an Intranet setting, and may also be inferred for an Internet setting.
A collaborative filtering system for predicting a user""s rating for an item, according to the invention includes a memory and a processor. The memory stores a correlation coefficient for each user in the system or the data necessary for calculating the correlation coefficients. The correlation coefficient is a measure of the similarity in ratings between pairs of users in the system who have rated a particular item. The memory also stores ratings for the item made by other users in the system. The processor calculates the weighted average of all the ratings for the item, wherein the weighted average is the sum of the product of a rating and its respective correlation coefficient divided by the sum of the correlation coefficients to provide a predicted user rating. The users are members of a predetermined organization and the initial value of the correlation coefficient for each pair of users in the system comprises a predetermined organizational relationship among the users.
Once the collaborative filtering system is up and running, the initial the values for the correlation coefficients can be updated as users provide ratings to items. To provide further accuracy in the correlation coefficients, and thus in the resulting prediction and recommendation, the correlation coefficients can be updated when a user changes his/her rating for a particular item. This is accomplished by backtracking, i.e., removing the prior rating and replacing it with the new rating, then recalculating the affected correlation coefficients.
Preferably, ratings are provided in the form of enumerated values (such as 0,1,2,3,4,5). This guarantees that correlations are always defined (no division by zero). Also, preferably, predictions are calculated about a threshold value or constant (such as the midpoint or average of the enumerated values, 2.5).