As shown in FIG. 1, consumers use a recommender system 100 to find certain items, e.g., products and services, that they might prefer over others. With a prediction function, the recommender system generally uses the ratings 101 of preferences on items made by other consumers to predict 120 recommendations 102, i.e., likes and dislikes, over a much larger set of items. This data processing application is also known as collaborative filtering.
The prediction function uses a tabular database, i.e., a preference matrix 103, of stored 110 customer-scores 101 to make recommendations. For example, consumers score items such as movies and books. It is not unusual for the number of entries in the table 103 to be enormous, e.g., 103 rows of items and 107 columns of consumers. Generally, most entries 104 in the table 103 are empty with unknown scores, i.e., unrated items. Hence, the table is “sparse.”
In an operational recommender system, the entries in the table 103 need to be revised, with rows, columns, and individual scores constantly being added, edited, or retracted as consumer indicate their preferences. These revisions can arrive asynchronously from many distributed sources at a very high rate. For example, movie rating evolve rapidly over a very short time, perhaps a day as a new movie is released.
Efficient methods for storing, updating, and accessing preference tables are sought constantly. Estimating a reasonably efficient, compact, and accurate prediction function is an even harder problem that has attracted much attention in the fields of data mining and machine learning.
Nearest-neighbor search methods, which effectively match against raw data, remain popular despite high search costs and limited predictivity. More sophisticated prediction methods are often defeated by the very high dimensionality of the data, high computational costs of model fitting, and an inability to adapt to new or retracted data. Moreover, with extremely sparse tables, the data are often insufficient to support accurate parameter estimates of those methods.
Typically, a dense subset of the table is constructed using responses of a focus group, and the prediction function is extrapolated from those responses. The very high dimensionality of the problem has partly motivated explorations of multi-linear models such as a singular value decomposition (SVD), both as a compressed representation of the data, and as a basis for predictions via linear regression. Linear regression models generally have lower sample complexity per parameter than non-linear models and can thus be expected to have better generalization.
The SVD and related eigenvalue decomposition (EVD) lie at the heart of many data analysis methods. They are used for dimensionality reduction, noise suppression, clustering, factoring, and model-fitting. Several well-known recommender systems are based on the SVD.
Unfortunately, determining the SVD of a very large matrix is a treacherous affair. Most prior art recommender systems need to be taken off-line, as shown in FIG. 1, while the preference table 103 and SVD are updated by a batch process 130. The decomposition has a quadratic run-time, in terms of the size of the matrix 103. Therefore, this is typically done at night. As a result, when the preferences are evolving rapidly during the day, accurate predictions may not be available until a day later.
In addition, the traditional SVD does not produce uniquely defined results when there are missing values 104, as is the case in most recommender systems. Adapting to new or retracted data is also an issue, though it is well known how to append or retract entire columns or rows, provided that they are complete. This is not the case when the data arrive in fragments of rows and columns.
Updating 30 the SVD is generally based on Lanczos methods, symmetric eigenvalue perturbations, or relationships between the SVD and the QR-decomposition, e.g., as computed by a modified Gram-Schmidt procedure. The last category includes some very fast methods, but is vulnerable to loss of orthogonality.
For example, the left singular vectors can be determined in O(pqr2) time. If p, q, and r are known in advance and p>>q>>r, then the expected complexity falls to O(pqr). However, the precision of the orthogonality is not preserved, and satisfactory results have only been demonstrated for matrices having a few hundred columns, which is too small for practical applications.
The prior art does not consider missing values 104, except insofar as they are treated as zero values. In the batch-SVD update context, missing values are usually handled via subspace imputation, using a computationally expensive expectation-maximization procedure.
First, an SVD of all complete columns is performed. Next, incomplete columns are regressed against the SVD to estimate missing values. Then, the completed data are refactored and re-imputed until a fixed point is reached.
This is an extremely slow process that operates in quadratic time and only works when very few values are missing. It has the further demerit that the imputation does not minimize effective rank. Other heuristics simply fill missing values with row- or column-means.
In the special case where a preference matrix is nearly dense, its normalized scatter matrix may be fully dense, due to fill-ins. One heuristic interprets the eigenvectors of the scatter matrix as the right singular vectors of the scatter matrix. This is strictly incorrect. There may not be any imputation of the missing values that is consistent with eigenvectors of the scatter matrix. For the very sparse problems, as considered by the invention, this objection is mooted by the fact that the scatter matrix is also incomplete, and its eigenvectors are undefined.
Therefore, there is a need to provide a method for revising preferences in a recommender system with incomplete data in an on-line manner.