This invention relates generally to techniques for providing to individuals in a computer network environment information that is of interest to them. More particularly, this invention relates to a technique using a numerical method (a principal component analysis) to recommend to an individual selective information, such as, books, movies, stocks, or toys, available within a computer network.
The World Wide Web and other computer networked infrastructure is a repository of vast amounts of data. As such, users face an arduous task of retrieving information which matches their requirements and preferences. Collaborative filtering technology provides an automated approach for retrieval of preference-based content in this networked environment.
The term xe2x80x9ccollaborative filteringxe2x80x9d refers to a process in which individuals cooperate with one another to screen information by recording the reactions of individuals to material they review. This process is sometimes referred to as a xe2x80x9crecommender systemxe2x80x9d.
With content personalization becoming an essential component of many web-based consumer services and the number of Internet users growing daily, there is a demand for efficient collaborative filtering algorithms. A variety of methods have been applied to domains such as newsgroup postings, books, music, movies and web sites. Different domains have different properties, which motivate variations in methodology. What these systems have in common is the ability to record from many users numerical approval ratings for a domain of objects.
A new user enters the system and provides new ratings for an object set. These ratings are used to find a similar set of users. Ratings from that set of users are then used to predict the ratings that the new user will give to objects not yet considered.
A collaborative filtering system should be: (1) effective, such that recommended objects receive high ratings; and (2) efficient, such that recommendations can be generated quickly.
Collaborative filtering is particularly effective on the World Wide Web, where a large corpus of users is available. But very large numbers of users introduces computational challenges.
In view of the foregoing, it would be highly desirable to provide a collaborative filtering technique with improved efficiency.
The method of the invention is executed by a computer under the control of a program stored in computer memory. The method includes the step of accumulating preference data for a set of individuals. The preference data is transformed from multi-dimensional data to lower-dimensional data using a principal component analysis. The lower-dimensional data is then converted into a recommendation map, which provides a hierarchy of values for content based upon previous ratings of the content. Preference information for a selected individual is then collected. The preference information is then mapped to the recommendation map. Content from a set of deliverable content is then routed to the selected individual.
The apparatus of the invention is a computer readable memory to direct a computer to function in a specified manner. The computer readable memory includes executable instructions forming a preference accumulation module to accumulate preference data for a set of individuals. A set of executable instructions forms a mapping module including a principal component analysis module to transform the preference data from multi-dimensional data to lower-dimensional data using a principal component analysis. A clustering module converts the lower-dimensional data into a recommendation map. The recommendation module coordinates preference information for a selected individual and maps the preference information to the recommendation map. The computer readable memory also includes a content delivery module to route to the selected individual chosen content from a set of deliverable content.
The invention utilizes a novel principal component analysis (PCA) and clustering-based collaborative filtering technique for efficient and effective personalized information retrieval. The invention avoids semantic categories by relying solely upon numerical ratings, such that each content block and each user is treated as a xe2x80x9cblack boxxe2x80x9d to which statistical pattern recognition techniques are applied. Using this approach, the technique of the invention is applicable to different domains of objects without customization for each domain. Thus, the technique is readily used for recommending diverse content, such as books, movies, toys, stocks, and music.
Preferably, the invention splits the prediction process into an off-line and an on-line component. The following computations are performed off-line: the correlation matrix between users, the principal component analysis, clustering, the prediction vectors for each cluster, and the formation of a recommendation map. This is performed in an O(kn2) time order, where O refers to order complexity, which is a standard measure of computational efficience in computer science, where n is the number of users in the database and k is the number of objections in the prediction set. Whereas in traditional prior art collaborative filtering methods, processing time scales as n2, advantageously the present invention breaks computation into off-line and on-line phases such that on-line processing time is constant, i.e., is independent of the number of users n in a database. Advantageously, off-line computation of principal component analysis, clustering, and formation of a recommendation map permits the on-line computation to be especially efficient and, as noted, independent of the number of users n. The recommendation map is used on-line to identify objects to recommend. This on-line process is achieved in a O(k) time order. As k is a constant, an O(k) time order is simply the computer theory equivalent statement to saying that processing time is independent of n, i.e., processing time is independent of the number of users in a database. As noted above, this feature of the present invention advantageously facilitates rapid computation of recommendations.