Recommender systems have become popular through sites such as amazon.com and netflix.com These systems analyze a database of many users' ratings on products, for example, movies, and make predictions about users' ratings/preferences on unseen/un-bought products. Many state-of-the-art recommender systems rely on “collaborative filtering” to discover user preferences and patterns. The technique interprets user data as a large matrix, one dimension, say, rows are indexed by users, and then columns are indexed by products. Elements of the matrix are the ratings of users on products. This matrix is usually large, due to large numbers of users and products, and highly sparse, because each user gave ratings on a small number of products. Then collaborative filtering analyzes the patterns of those observed ratings and makes predictions on those unobserved/missing ratings. To discover the rich structure of a huge sparse rating matrix, empirical studies showed that the number of factors should be sufficiently large. In this sense, collaborative filtering is a matrix completion problem.
Conventional techniques include low-rank matrix factorization methods where data is stored in a database as an M by N matrix, where M represents the users' ratings on N products. Since only a small portion of the matrix's ratings are observed, this matrix is quite large and sparse. The Low-rank Factorization technique factorizes the matrix into a product of two matrices, one represent user-specific factors, and the other product-specific factors. An optimization procedure is employed to find the factors that can best explain the data. The number of either user factors or product factors is finite. The prediction of a user's rating on a product is done by retrieving the factor vector of the user and the factor vector of the product, and then calculating their inner product to give the result. With the sheer growth of online user data, it becomes a big challenge to develop learning algorithms that are accurate and scalable.
Recent advances in collaborative filtering has fueled the growth of low-rank matrix factorization methods. Low-rank matrix factorization algorithms for collaborative filtering can be roughly grouped into non-probabilistic and probabilistic approaches. In a recent Netflix competition, the matrix factorization technique is highly popular among the top participants. When matrix factorization is applied by multiplication of a number of user factors and movie factors, both sides of the factors are found by matrix factorization. For computational efficiency, all the techniques assume the number of factors is small. Therefore the matrix factorization is low-rank factorization. However, limiting the rank of the system often scarifies the predictive accuracy, as the large variety of user patterns needs a large number of factors to explain.