1. Field
The application relates to computer-based methods and apparatus for recommender systems.
2. Prior Art
A recommender system can be used to predict preferences of users for products. Consider a recommender system involving a plurality of products rated by a plurality of users. An observed rating is a rating of one of the products that has been made by one of the users. If a particular user has not yet rated a particular product then that rating is a missing rating. The product recommendation problem is to predict missing ratings.
For example, a user named Tony has rated three movies: Titanic as 4 stars, Star Wars as 2 stars, and The Godfather as 3 stars. Tony has not rated Independence Day and Jaws. A user named Mike has rated Titanic as 1 star and Jaws as 3 stars. Mike has not rated Star Wars, The Godfather or Independence Day. The set of Tony's observed ratings thus consists of his ratings of Titanic, Star Wars, and The Godfather. Mike's observed ratings thus consist of his ratings for Titanic and Jaws. The goal of movie recommendation to accurately predict the missing ratings which constitute the ratings Tony would give to Independence Day and Jaws and the ratings Mike would give to Star Wars, The Godfather and Independence Day.
A set of product ratings can be represented as a data-matrix, where some entries of the data-matrix are undefined. Let k denote the number of products and let n denote the number of users. In the movie example above k= and n=2. Let yi(j) denote the observed rating of the jth product by the ith user. Let yi denote all observed ratings from the ith user. The data-matrix is the set of all observed ratings from all users and can be represented as 101 in FIG. 1. Positions in the data-matrix corresponding to missing ratings have no assigned value. A data-matrix for the movie example above is illustrated is 201 in FIG. 2.
Performance of recommender systems is generally measured using a Root Mean Squared Error (RMSE). The RMSE measures a difference between a predicted rating and an actual observed rating.
In addition to movies there are many other products whose ratings are able to be predicted by recommender systems, e.g. book and television shows. Recommender systems are applicable in Internet dating sites to predict how users will rate potential mates. Grocery stores can use recommender systems to predict buying habits of shoppers. Recommender systems are applicable in medicine. Consider a plurality of drugs and a plurality of patients. Consider a patient who has responded well to some drugs but not others. A response by a patient to a drug is akin to a rating of that drug by the patient. Recommender systems can thus be applied to predict drugs the patient will respond well to.
Recommender systems are currently an active area of research. A matrix factorization approach currently stands out for large-scale, real-world, recommender systems. In the matrix factorization approach, an integer l is first chosen. A data-matrix 101 of size k×n is then approximated by a product of a left matrix 301 of size k×l and a right matrix 302 of size l×n. The left matrix 301 and right matrix 302 are generally estimated to minimize an error measure between their product and the data matrix 101. Once these matrices have been estimated, missing ratings of a particular user are predicted by an inner product of appropriate columns and rows from the left matrix 301 and the right matrix 302. For example, yi(j) is predicted by the dot product of the ith row of the left matrix 301 and the jth column of the right matrix 302.
The matrix factorization approach has a number of disadvantages. A variety of approaches for estimation of the left matrix 301 and the right matrix 302 are in general use. These approaches result in different values for the left matrix 301 and the right matrix 302. Hence performance of these approaches varies. Little guidance in selection of the integer l is available. Generally a variety of values of l need to be tried. In many cases a large l performs best. A large l, however, means that the left matrix 301 and the right matrix 302 are also very large and finding these matrices becomes computationally expensive. Once the left matrix 301 and the right matrix 302 have been estimated it is generally difficult to add new products or new users without re-estimating the left matrix 301 and the right matrix 302. The matrix factorization approach generally yields performance that is often too low for many applications. A number of heuristic methods have been proposed to increase performance. One way is to average predictions obtained using slightly different training approaches and/or different values of l. This method is generally too cumbersome for practical application. Performance of matrix factorization may be increased by applying a combination of pre-processing, post-processing and data manipulation steps. Such steps are generally heuristic in nature and may only work on particular data-sets. For example, subtraction of a global mean may improve performance for movie recommendation but not for a dating site.
A need exists for a method and apparatus that achieves high performance without the disadvantages of the matrix factorization approach. The method and apparatus should be able to add new products and new users without requiring extensive re-calculations. The method and apparatus should be able to achieve high performance without requiring averaging of many predictions. The method and apparatus should not rely on heuristic data manipulations to achieve high performance.