This invention relates generally to recommender systems, and more particularly to retrofitting such systems so that they can be scaled to applications involving very large databases.
Recommender systems, also referred to as predictive or predictor systems, collaborative filtering systems, and document similarity engines, among other terms, typically target determining a set of items, such as products, articles, etc., to match users based on other users"" preferences and selections. Usually, a query is stated in terms of what is known about a user, and recommendations are retrieved based on other users"" preferences. Generally, a prediction is made based on retrieving the set of users that are similar to a user, and then basing the recommendation on a weighted score of the matches.
Recommender systems have traditionally been based on memory-intensive techniques, where it is assumed the data or a large indexing structure over them is loaded into memory. Such systems, for example, are used by Internet web sites, to predict what products a consumer will purchase, or what web sites a computer user will browse to next. With the increasing popularity of the Internet and electronic commerce, use of recommender systems will likely increase.
A difficulty with recommender systems is, however, that they do not scale well to large databases. Such systems may fail as the size of the data grows, such as the size of an electronic commerce store grows, the inventory grows, the site decides to add more usage data to the prediction data, etc. This results in prohibitively expensive load times, which may cause timeouts and other problems. The response times may also increase as the data increase, such that performance requirements begin to be violated. For these and other reasons, therefore, there is a need for the present invention.
The invention relates to retrofitting recommender systems, so that they can, for example, scale to voluminous data. The data is generally organized into records (also referred to as rows) and dimensions (also referred to as columns, or items). In one embodiment, a method first repeats reducing the data by a number of records, until a predetermined accuracy threshold or a predetermined performance requirement is met. If the accuracy threshold is met first, then the method repeats removing a highest-frequency dimension from the data, until the performance requirement is also met. The reduced data is provided to the recommender system, which generates predictions based thereon, and also based on a query. Any dimension previously removed from the data is subsequently added back to the predictions produced by the recommender systems, if the dimension is not already part of the query. In other embodiments of the invention, clustering of the data and/or of the query is also performed.
Embodiments of the invention provide for advantages not found within the prior art. Records and/or dimensions are removed from the data to ensure that both accuracy and performance are still met. Thus, even if the size of a database is very large, accurate predictions can still be accomplished, while still maintaining performance. This is attained while not modifying an existing implementation of a recommender system, which may have been very difficult and/or time-consuming to set up, and thus which may not be desirable to change. Rather, embodiments retrofit the recommender system, by adding pre- and/or post-processing around the existing deployed recommender system, by modifying the data that is input to the existing system, and/or modifying the predictions that are output from the existing system.