Conventional recommendation systems provide information about matches between users (e.g., shoppers) and items (e.g., books, videos, games) based on user interests, preferences, history, or other factors. For example, if a system has data that a user has previously accessed (e.g., purchased, rented, borrowed, played) a set of items, then a recommendation system may identify similar items and recommend them to the user based on the data about the user's own actions (e.g., “if you liked this, you might like that”).
There are two major types of conventional recommendation systems: collaborative filtering based systems, and feature based systems. Collaborative filtering depends on actual user events (e.g., user who bought/watched/read an item). Matrix factorization over collaborative filters is regarded as providing superior results for a recommendation system. Matrix factorization over collaborative filters may employ two different techniques based on the type of data available. The type of data available may be binary (e.g., 0/1), one-sided (e.g., positive only), an explicit rating (e.g., 5 star scale), may be continuous, continuous within a range, or may have other forms. Binary or one-sided data allows embedding of items in a latent space so that the embedding reflects true relationships between items. For example, items that are similar to each other will be located near each other. Explicit rating data allows the use of multi-level and even continuous data but at the cost of losing the true item-item relationship where items that are similar to each other are located close to each other. Maintaining the item-item relationship may facilitate generating worthwhile recommendations.
Conventional matrix factorization models map users and items to a joint latent factor space and model user-item interactions as inner products in the joint latent factor space. An item may be associated with an item vector whose elements measure the extent to which the item possesses some factors. Similarly, a user may be associated with a user vector whose elements measure the extent of interest the user has in items that are high in corresponding factors. The dot product of the vectors may describe the interaction between the user and item and may be used to determine whether to make a recommendation to a user. More specifically, every user i may be assigned a vector ui in a latent space, and every item j may also be assigned a vector vj in the latent space, i and j being integers. The dot product ui·vj represents the score between the user i and the item j. The score represents the strength of the relationship between the user i and the item j and may be used to make a recommendation (e.g., recommend item with highest score). Conventional systems may rely on either binary data or on non-binary (e.g., multi-valued) data but not both when creating the data or latent item space upon which a recommendation is based.
When computing recommendations for a specific user i using matrix factorization, all the items j in the catalog may be scored. Typically, matrix factorization requires that there be some positive scores and some negative scores, otherwise the solutions may be trivial and of no practical use. Systems where, for example, users provide a discrete value (e.g., numerical score, number of stars) for an item may be well suited to matrix factorization. However, users typically only provide ratings for items they have accessed. Similarly, in binary usage systems, there may only be positive indications (e.g., indication that user watched a movie, indication that user played a game, indication that user read a book). There may not be any strength associated with a like and there may not be any negative indications (e.g., user did not watch movie, user did not play game, user did not read book, user did not access/acquire/use item). Thus, in either binary or non-binary systems, data may not be available for all combinations of i and j, and, only one type of data (e.g., binary, one-sided, discrete, continuous) may be considered. Discrete data refers to data that may take on or have a finite or countably infinite number of values. Continuous data may take on any value within a range. Discrete data may be produced by counting, continuous data may be produced by measuring.
After all the items j have been scored, the highest scoring items may be selected and recommended. This may be represented as: given i, find j=arg max ui·vj. In mathematics, arg max is the argument of the maximum, which is defined as the set of points of the given argument for which the given function attains its maximum value.
      arg    ⁢                  ⁢                  max        x            ⁢              f        ⁡                  (          x          )                      :=      {          x      ❘              ∀                              y            ⁢                          :                        ⁢                                                  ⁢                          f              ⁡                              (                y                )                                              ≤                      f            ⁡                          (              x              )                                            }  
In other words, arg maxx f(x) is the set of values of x for which f(x) attains its largest value M. For example, if f(x) is 1−|x|, then it attains its maximum value of 1 at x=0 and only there, so arg maxx (1−|x|)={0}.