Hard-disk drives and digital video compression technologies have created the possibility of time-shifting live television (TV) and recording a large number of TV shows in high quality without having to worry about the availability of tapes or other removable storage media. At the same time, digitalization of audiovisual signals has multiplied the number of content sources for an average user. Hundreds of channels are available using a simple parabolic antenna and a TV receiver. Huge amounts of video clips are published daily on the Internet across various services, and all major content producers are already making their entire content libraries available online. As a consequence, thousands of potentially interesting programs are made available every day and can be recorded and stored locally for later access.
However, in view of this enormous amount of offered content items, individual content selection becomes an important issue. Information that does not fit to a user profile should be filtered out and the right content item that matches a user's needs and preferences (e.g. a user profile) should be selected.
Recommender systems address these problems by estimating a like-degree of a certain content item for a certain user profile and automatically ranking the content item. This can be done by comparing a content item's characteristics (e.g. features, metadata, etc.) with a user profile or with similar profiles of other users. Thus, recommender systems can be seen as tools for filtering out unwanted content and bringing interesting content to the attention of the user.
The use of recommender technology is steadily being introduced into the market. Among various examples, websites offer a recommender to support users in finding content items (e.g. movies) they like, and electronics devices (e.g. personal video recorders) use recommender for automatic filtering of content items. Recommender systems are increasingly being applied to individualize or personalize services and products by learning a user profile, wherein machine learning techniques can be used to infer the ratings of new content items.
Recommenders are typically offered as stand-alone services or units, or as add-ons (e.g. plug-ins) to existing services or units. They increasingly appear in consumer devices, such as TV sets or video recorders. Recommenders typically require user feedback to learn a user's preferences. Implicit learning frees the user from having to explicitly rate items, and may by derived by observing user actions such as purchases, downloads, selections of items for play back or deletion, etc. Detected user actions can be interpreted by the recommender and translated into a rating. For example, a recommender may interpret a purchase action as positive rating, or, in case of video items, a total viewing duration of more/less than 50% may imply a positive/negative rating.
An example of a recommender is presented in US 2008 0104127 A1. There, a media guidance system is described which is capable of recommending content items to a user based on their relevancy. For retrieving content items, the system generates search criteria first, which are derived from personalisation data that have been generated by monitoring user behaviour and/or by receiving explicit user preferences. For instance, the search criteria can be the string: “Silvester Stalone”, if the personalisation data yield that the user likes this actor. Such search criterion is sent to a media information data base for retrieving matching content items. Matching content items are rated and, if the rated items are relevant, are eventually recommended to the user.
Grossly speaking, there are two types of recommender systems, those based on a community of users and those based on metadata.
The first type is known as collaborative filtering, where either (i) members of the community are characterized by the ratings they give to items or (ii) items are characterized by the ratings they receive from the members of the community. These characterizations are next used to define similarity among users or items, respectively. For a specific member of the community and a specific item that has not yet been rated by this member, these similarities are used to infer for this member a rating for this item by combining ratings of similar users or similar items, respectively.
The second type of recommender systems uses available metadata about items, which typically comes in the form of features and associated values or lists of values. The rating history of a user is exploited to build a profile of this user in terms of feature-value pairs, indicating for these pairs a like-degree. For a new item that has not yet been rated by this user, its metadata is used, and the like-degrees of each feature-value pair present are combined to obtain an overall rating. A simple, but popular algorithm in this context is called naive Bayes, and it employs Bayesian classification.
Users of personal video recorders would like to have access to any content available, independently of its source. No matter whether the content will be broadcast (and thus listed in an electronic program guide (EPG)), or is available in a video-on-demand library or somewhere else on the Internet, users would like to have access to it and a recommender system should be able to provide recommendations for videos independently from its location or source. Independently of its type, whether it is based on collaborative filtering or is content-based, a recommender system needs to have access to all the items for which a recommendation has to be generated. For example, a recommender for a video-on-demand library needs to access all the items of the video-on-demand library to be able to calculate for each item the probability that a given user would like it, and ultimately to select a list of top rated items.
However, filtering entire databases and rating all items based on a user profile does not work for very large distributed databases, not only because it is inefficient and not scalable, but especially because it requires access to all the items of all the databases for which recommendations have to be generated.