There are many situations where it is useful to be able to distinguish and interpret patterns of user data, that, to at least to some extend, reflects the preferences of the user. For a number of users that have rated a number of items such a recognised pattern may e.g. be used for distinguishing certain items or users from each other in order to be able to select or rank the items or users which are considered to have most in common with a reference item or user under the present circumstances. In a typical situation, automatic predictions which have been based on interests or preferences of a number of users may be used for obtaining some kind of ranking or intelligent selection. Such predictions typically rely on collected information which has been filtered, using some filtering mechanism, and on the underlying assumption that those users who had a similar taste in the past often tend to agree also in the near future. Such a principle may be applicable for various recommendation systems which are adapted to selectively distinguish users that have a similar “preference pattern” from a group of users. Such a recommendation system, may typically be directed to a recommendation of an asset, such as e.g. music, movies, restaurants or travelling destinations.
Collaborative filtering is one of the most successful methods used in present commercial product recommendation systems. The collaborative filtering concept is heavily based on filtering information collectable from data sources and user profiles in a collaborative manner in order to find correlations between users or items. The need of an automated system that provides personalized recommendations that are accurate, scalable and efficient has actually increased with practically the same rate as the increasing amount of available data.
The main task in collaborative filtering is to predict a reference user's preference for a certain item, on the basis of other users' preferences. Collected data of the reference user is matched against data of other users in order to identify the users having similar preferences or tastes as the reference user. These users are typically referred to as neighbours. Because of the discovered relationship in taste, items preferred by the neighbours which are new to the reference user will then be recommended to the reference user.
In collaborative filtering the data to be processed is typically represented by a user-item matrix, R, as illustrated with FIG. 1. In the figure, matrix R comprises rating data, typically provided by m users, u1 . . . um, where each user is represented by a row-vector, in an n-dimensional space capable of covering n items. For each of the items in the matrix a rating, R1,1 . . . Rm,n, respectively, can be specified by a respective user, where each item in the matrix is represented by a column-vector in an m-dimensional space. In a typical scenario each position in the matrix will either comprise a rating that has been given to the respective item by a respective user, or be blank, for the occasion that the user for some reason has not rated that particular item.
Normally there are much more items in a dataset that are un-rated than items that have actually been rated, and thus, the co-rated item space between two users will have few dimensions to consider, by the recommender system.
The Movielens® dataset, is a publicly available dataset that can be used for testing and evaluating different recommender systems. The distribution of this dataset, that comprises 100 000 ratings on 1 682 items from 943 users, represented as the number of users having a certain number of co-rated items, is illustrated in the diagram of FIG. 2. According to this diagram, the most common number of co-rated items, given by the peak of the graph, is 3 items, i.e. out of 1 682 items, only 3 items have been co-rated by the largest group of users. A big challenge in collaborative filtering is therefore to be able to handle such sparsely rated data sets, and to obtain a reliable prediction also on the basis of a very limited amount of rating data.
In “Empirical Analysis of Predictive Algorithms for Collaborative Filtering”, 1998, pages 43-58, by John S. Breese, David Heckerman and Carl Kadie, it has been suggested that a default preference is inferred to the items of a sparse dataset for which no explicit preference, or rate has been given by a user. It is suggested that in such a case, a default preference is computed, which is based on the union of two users preferences, instead of the intersection of the users preferences. In addition to the union of the preferences, it is also suggested that a number of additional items is included to the rated data. In order to avoid biased results, the document also suggests that such a default preference is chosen as a neutral, or somewhat negative, preference.
In “Collaborative filtering with decoupled models for preferences and ratings”, 2003, pages 309-316, by Rong Jin, Luo S I, ChengXiang Zhai and Jamie Callen, any possible difference in performance when using item average rating as an item default preference or when using the user average rating as user default preference, is examined. The result of these examinations indicates an almost identical performance.
In “Evaluating collaborative filtering systems” ACM Trans. Inf. Syst. 22(1):5-53, by Jonathan L. Herlocker, Joseph A. Konstan, Loren G. Terveen and John T. Riedl, the importance of having an algorithm and a dataset that is designed to support each other has been demonstrated. It is proposed that a recommendation system that is designed to produce recommendations for items, such as e.g. movies, that are appreciated by a reference user, will be dominated by higher ratings mainly since people tend to rate what they watch, and watch the movies that they like.
However, not much research has been made on how to improve an inferred default rating so that a user's rating profile is maintained after default ratings have been added to a dataset. Existing default preferences, such as the ones described in the documents referred to above, are based solely on either the users average rating, or on the items average rating. As for presently used default rating schemes, no account is however taken to how the users ratings and the items ratings are distributed.