Conventional content recommendation systems maintain graph structures such as matrices that include values associated with a group of users' interactions with content items (e.g., videos, books, and products). For example, a movie recommendation system may construct a recommendation matrix with columns that correspond to movies, rows that correspond to users, and entries within each element that include a score (e.g., on a scale of 1-5) assigned to the movie by a user, or a value of “0” if the user has not scored or viewed the movie. In another example, a product recommendation system may construct a recommendation matrix with columns that correspond to available products (e.g., having an associate product listing), rows that correspond to users, and entries within each element that include a binary value to indicate whether a particular user owns a particular product (e.g., “1” indicates the user owns the product and “0” indicates the user does not own the product). As another example,
The primary task or goal of such recommendation systems is to predict how a user will interact with an item the user has not yet interacted with (e.g., instances in which an entry includes a “0” value). In the example of a product recommendation system, the goal is to identify what products the user will purchase based on what other products the user has purchased and what products other users have purchased. Similarly, in the example of a movie recommendation system, the goal of the system is to identify movies that a user is likely to enjoy watching based on how the user has rated other movies, and how other users rated movies.
When considering large systems with millions of items and millions of users, the storage space necessary to store the recommendation matrix is very large and can reach hundreds of thousands of terabytes. Though these recommendation matrices are very large, they are also typically very sparse because there are often many users and many items. A user may have watched hundreds of movies, or purchased hundreds of products, but with recommendation systems that involve millions of movies or products, most entries in the recommendation system will still have an unknown values (e.g., a null value). To store each one of these unknown values will still require at least one byte of information, thus resulting in inefficient utilization of storage space because a large portion of the space necessary to represent a recommendation matrix is needed simply to store the unknown values.
The storage space utilization issue is further compounded in instances in which the recommendation system also attempts to track timing of unknown entries changing to known entries (e.g., when a user rates a previously unrated movie). In these instances, the recommendation system must routinely create a new version of the recommendation matrix to track changes to the entries, and just a single additional version would double the amount of space necessary to represent the recommendation matrix and a record of the changes to its entries.