The present invention concerns predicting and selectively collecting attribute values, such as a person""s preferences, as might be indicated by item ratings for example. Such item ratings may be used for recommending items.
In last decade or so, computers have become increasingly interconnected by networks, and via the Internet. The proliferation of networks, in conjunction with the increased availability of inexpensive data storage means, has afforded computer users unprecedented access to a wealth of data. Unfortunately, however, the very vastness of available data can overwhelm a user. Desired data can become difficult to find and search heuristics employed to locate desired data often return unwanted data.
Various concepts have been employed to help users locate desired data. In the context of the Internet for example, some services have organized content based on a hierarchy of categories. A user may then navigate through a series of hierarchical menus to find content that may be of interest to them. An example of such a service is the YAHOO(trademark) World Wide Web site on the Internet. Unfortunately, content, in the form of Internet xe2x80x9cweb sitesxe2x80x9d for example, must be organized by the service and users must navigate through menus. If a user mistakenly believes that a category will be of interest or include what they were looking for, but the category turns out to be irrelevant, the user must backtrack through one or more hierarchical levels of categories. Moreover, such services which provide hierarchical menus of categories are passive. That is, a user must actively navigate through the hierarchical menus of categories.
Again in the context of the Internet for example, some services provide xe2x80x9csearch enginesxe2x80x9d which search databased content or xe2x80x9cweb sitesxe2x80x9d pursuant to a user query. In response to a user""s query, a rank ordered list, which includes brief descriptions of the uncovered content, as well as hypertext links (text, having associated Internet address information, which, when activated, commands a computer to retrieve content from the associated Internet address) to the uncovered content is returned. The rank ordering of the list is typically based on a match between words appearing in the query and words appearing in the content. Unfortunately, however, present limitations of search heuristics often cause irrelevant content to be returned in response to a query. Again, unfortunately, the very wealth of available content impairs the efficacy of these search engines since it is difficult to separate irrelevant content from relevant content.
Moreover, as was the case with services which provide hierarchical menus of categories, search engines are passive. That is, a user must actively submit a query. To address this disadvantage, systems for recommending an item, such as content, to a user have been implemented.
xc2xa71.2.1 Recommender Systems
So-called xe2x80x9crecommender systemsxe2x80x9d have been implemented to recommend an item, such as content, a movie, a book, or a music album for example, to a user. The growth of Internet commerce has stimulated the use of collaborative filtering algorithms as recommender systems. (See, e.g., the article, Schafer et al., xe2x80x9cRecommender Systems in E-Commercexe2x80x9d, Proceedings of the ACM Conference on Electronic Commerce, pp. 158-166 (November 1999), hereafter referred to as xe2x80x9cthe Schafer articlexe2x80x9d.) Although collaborative filtering may be known to one skilled in the art, it is introduced below for the reader""s convenience.
xc2xa71.2.2 Collaborative Filtering
In view of the drawbacks of the above discussed data location concepts, xe2x80x9ccollaborative filteringxe2x80x9d systems have been developed. A goal of collaborative filtering is to predict the attributes of one user (referred to as xe2x80x9cthe active userxe2x80x9d), based on the attributes of a group of users. Given the growth of Internet commerce, a valuable attribute to predict is an active user""s preference for an item. For example, given the active user""s ratings for several movies and a database of other users"" movie ratings, a collaborative filtering system may be used to predict how the active user would rate movies not seen by the active user (but rated by the other users). More specifically, collaborative filtering systems have assumed that an active user will have similar attributes as similar users and, conversely, collaborative filtering systems may assume that an active user will have dissimilar attributes to dissimilar users. Again, in the context of preferences, similar users may prefer similar items and dissimilar users may prefer dissimilar items. Hence, the effectiveness of collaborative filtering methods has been predicated on the underlying assumption that human preferences are correlated.
Collaborative filtering techniques have been classified into one of two categoriesxe2x80x94memory-based and model-based. (See, e.g., the article, Breese et al., xe2x80x9cEmpirical Analysis of Predictive Algorithms for Collaborative Filteringxe2x80x9d, Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 43-52 (July 1998), hereafter referred to as xe2x80x9cthe Breese articlexe2x80x9d.) Memory-based collaborative filtering techniques, and drawbacks of such techniques, are introduced in xc2xa71.2.2.1 below. Then, model-based collaborative filtering techniques, and drawbacks of such techniques, are introduced in xc2xa71.2.2.2 below.
xc2xa71.2.2.1 Memory-based Collaborative Filtering Techniques and Their Shortcomings
Memory-based collaborative filtering techniques maintain a database of all users"" known attribute values (e.g., item ratings). Each predicted attribute value requires a computation using data from across the entire database.
Examples of memory-based collaborative filtering techniques may be found in the Breese article. Basically, collaborative filtering uses known attribute values (e.g., explicitly entered votes) of a new user (referred to as xe2x80x9cthe active casexe2x80x9d) and known attribute values of other users to predict values of attributes with unknown values of the new user (e.g., attribute values not yet entered by the new user). The mean vote {overscore (v)}i for an entity may be defined as:             v      i        _    =            1              M        i              ⁢                  ∑                  j          ∈                      I            i                              ⁢              xe2x80x83            ⁢              v                  i          ,          j                    
where
vi,jxe2x89xa1A value of attribute j of entity i. Typically, an integer value.
Mxe2x89xa1The number of attributes (e.g., in a database).
Iixe2x89xa1A set of attribute indexes for which entity I has known values (e.g., based on an explicitly entered vote). For example, I2={3,4} means that entity 2 has values for attributes 3 and 4.
Mixe2x89xa1The number of attributes for which entity i has known valuesxe2x80x94the number of elements in Ii.
Denoting parameters for the active case (i.e., new entity) with subscript xe2x80x9caxe2x80x9d, a prediction pa,j of active case attribute values (e.g., item ratings) for attributes without known values (i.e., attributes not in Ia) can be defined as:       p          a      ,      j        =                    v        a            _        +          K      ⁢                        ∑                                    i              =              1                        ,            n                          ⁢                  xe2x80x83                ⁢                              (                                          v                                  i                  ,                  j                                            -                                                v                  i                                _                                      )                    ⁢                      w                          a              ,              i                                          
where
K is a normalizing factor such that the absolute values of the weights sum to unity.
nxe2x89xa1The number of entities (e.g., users in a database).
wa,ixe2x89xa1The estimated weight (or alternatively match) between entity i and entity a.
Pi,jxe2x89xa1The predicted value of attribute j of entity i.
Hence, a predicted attribute value (e.g., item rating) is calculated from a weighted sum of the attribute values (e.g., votes) of each other user. The appearance of mean values in the formula merely serves to express values in terms of deviation from the mean value (i.e., defines a reference) and has no other significant impact.
The weights can reflect distance, correlation, or similarity between each user xe2x80x9cixe2x80x9d and the active user. Many collaborative filtering algorithms differ in the details of the xe2x80x9cweightxe2x80x9d calculation. Two examples of weight determination techniques are correlation and vector similarity, each of which is briefly introduced below.
The use of correlation for a weight calculation appears in the article, Resnick et al., xe2x80x9cGrouplens: An Open Architecture for Collaborative Filtering of Netnewsxe2x80x9d, Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175-186 (1994) (hereafter referred to as xe2x80x9cthe Grouplens articlexe2x80x9d). If a Pearson correlation coefficient is defined as the basis for the weights, the correlation between users xe2x80x9caxe2x80x9d and xe2x80x9cixe2x80x9d can be expressed as:       w    ⁢          (              a        ,        i            )        =                    ∑        j            ⁢              xe2x80x83            ⁢                        (                                    v                              a                ,                j                                      -                                          v                a                            _                                )                ⁢                  (                                    v                              i                ,                j                                      -                                          v                i                            _                                )                                              ∑          j                ⁢                  xe2x80x83                ⁢                                            (                                                v                                      a                    ,                    j                                                  -                                                      v                    a                                    _                                            )                        2                    ⁢                                    ∑              j                        ⁢                          xe2x80x83                        ⁢                                          (                                                      v                                          i                      ,                      j                                                        -                                                            v                      i                                        _                                                  )                            2                                          
where the summations over j are over the items for which both users xe2x80x9caxe2x80x9d and xe2x80x9cixe2x80x9d have recorded rating votes.
In the field of information retrieval, the similarity between two documents is often measured by treating each document as a vector of word frequencies and computing the cosine of the angle formed by the two frequency vectors. This concept can be adapted to collaborative filtering, where users correspond to documents, item titles correspond to words and votes or ratings correspond to word frequencies. Observed votes indicate a positive preferencexe2x80x94there is no role for negative votes and unobserved items receive a zero vote. If a cosine distance between feature vectors is used as the basis for the weights, the correlation between users xe2x80x9caxe2x80x9d and xe2x80x9cixe2x80x9d can be expressed as:       w    ⁡          (              a        ,        i            )        =            ∑      j        ⁢          xe2x80x83        ⁢                            v                      a            ,            j                                                              ∑                              k                ⁢                                  xe2x80x83                                ⁢                ε                ⁢                                  xe2x80x83                                ⁢                                  j                  a                                                      ⁢                          xe2x80x83                        ⁢                          v                              a                ,                k                            2                                          ⁢                        v                      i            ,            j                                                              ∑                              k                ⁢                                  xe2x80x83                                ⁢                ε                ⁢                                  xe2x80x83                                ⁢                                  l                  a                                                      ⁢                          xe2x80x83                        ⁢                          v                              i                ,                k                            2                                          
where the squared terms in the denominators serve to normalize votes so that users that vote on more titles will not, a priori, be more similar to other users. Other normalization schemes, including absolute sum and number of votes, are possible.
Memory-based collaborative filtering algorithms can be improved in a number of ways, as described in the Breese article, such as using default voting, considering inverse user frequency, and amplifying weights, for example.
Since each predicted attribute requires a computation using data from across the entire database, memory-based collaborative filtering techniques can become computationally expensive, in terms of both time and space, especially as the size of the database grows. More specifically, each predicted attribute may require a computation on the order of the number of users and the number of attributes (e.g., the number of items rated by any user).
On the positive side, memory-based methods are relatively simple and work reasonably well in practice. Unfortunately, however, their simplicity does not yield any insights into how a prediction was obtained. Thus, memory-based collaborative filtering techniques have a xe2x80x9cblack boxxe2x80x9d characteristic in that user attributes are provided and a predicted attribute is output.
Fortunately, with memory-based collaborative filtering techniques, new information, such as a user attribute (e.g., a user""s preference for a particular item) may be added easily and incrementally.
In view of the foregoing, memory-based collaborative filtering techniques have a number of shortcomings. More specifically, each prediction may be computationally expensive in terms of memory and time, and insights into how a prediction was arrived out are not offered.
xc2xa71.2.2.2 Model-based Collaborative Filtering Techniques and Their Shortcomings
Model-based collaborative filtering techniques compile users"" attributes (e.g., item preferences) into a descriptive model of users, attributes (e.g., items) and attribute values (e.g., item ratings). An unknown attribute value (e.g., an item rating) of a user can then be predicted based on the compiled model. That is, from a probabilistic perspective, collaborative filtering may be seen as determining the expected value of a vote, given what is known about a user. For an active user, assuming votes are integer values with a range from 0 to m, the probability that the active user will have a particular vote value for a particular item j may be expressed as:       p          a      ,      j        =            ∑              xe2x80x83            ⁢              (                  v                      a            ,            j                          )              =                  ∑                  i          =          0                m            ⁢              xe2x80x83            ⁢                        Pr          ⁢                      (                                                            v                                      a                    ,                    j                                                  =                                  i                  |                                      v                                          a                      ,                      k                                                                                  ,                              k                ∈                                  I                  a                                                      )                          ⁢        i            
where the probability expression is the probability that the active user will have a particular vote value for item j given the previously observed votes. Cluster models and Bayesian networks may be used as probabilistic models for collaborative filtering. (See, e.g., the Breese article.) Each is briefly introduced below.
In the cluster model, the probability of votes are conditionally independent given membership in an unobserved class variable C which takes on some relatively small number of discrete values. That is, there are certain groups or types of users capturing a common set of preferences and tastes. Given the class, the preferences regarding the various items are independent. The probability model relating joint probability of class and votes to a tractable set of conditional and marginal distributions is the standard naive Bayes formulation, namely:       Pr    ⁢          (                        C          =          c                ,                              v            1                    ⁢          …                ⁢                  xe2x80x83                ,                  V          n                    )        =            Pr      ⁢              (                  C          =          c                )              ⁢                  ∏                  i          =          1                n            ⁢              xe2x80x83            ⁢              Pr        ⁢                  (                                                    v                i                            |              C                        =            c                    )                    
The left-hand side of this expression is the probability of observing an individual of a particular class and a complete set of vote values. The parameters of the model, namely the probabilities of class membership and the conditional probabilities of votes given a class, are estimated from a training set of user votes. Since the class variables are not observed in the database of users, methods that can learn parameters for models with hidden variables, such as the EM algorithm, may be used. The number of classes may be selected by selecting the model structure that yields the largest (approximate) marginal likelihood of the data in the user database.
Alternatively, a Bayesian network with a node corresponding to each item in the domain may be used for model-based collaborative filtering. The states of each node correspond to the possible vote values (which may include a xe2x80x9cno votexe2x80x9d value) for each item. A learning algorithm is then applied. The learning algorithm searches over various model structures in terms of dependencies for each item. After the learning process, in the resulting Bayesian network, each item will have a set of parent items that are the best predictors of its votes. Each conditional probability table is represented by a decision tree encoding the conditional probabilities for that node.
As can be appreciated from the foregoing description of model-based collaborative filtering techniques, model-based collaborative filtering techniques may advantageously provide meaningful semantics and may yield insights into its predictions. Further, any assumptions in the model are explicit. Finally, compiled models take up relatively little storage and predictions based on compiled models are, relative to memory-based collaborative filtering techniques, computationally efficient, both from a time viewpoint and a memory viewpoint.
Unfortunately, compiling the model is often computationally expensive. This would not be a major drawback if the model did not need to be updated often. However, in order to account for new data, the model must be recompiled. In some applications, delaying the consideration of new data is not an option.
An example of a model-based collaborative filtering technique is described in U.S. Pat. No. 5,704,017, issued on Dec. 20, 1997 to Heckerman et al., and entitled, xe2x80x9cCollaborative Filtering Utilizing a Belief Networkxe2x80x9d (incorporated herein by reference).
xc2xa71.2.2.3 Challenges for Gathering Data (e.g., Item Ratings)
Recommender systems having practical applications have been designed to acquire information (e.g., to populate the database of a memory-based collaborative filtering system or to compile the model of a model-based collaborative filtering system) by (a) explicitly asking user for information (e.g., item ratings) and/or (b) implying attributes of users (e.g., based on hardware and/or software of the user""s computer, based on Internet content browsing behaviors of the user, based on purchasing behaviors of the user, etc.). Unfortunately, both explicit and implicit data acquisition have their drawbacks.
Regarding some drawbacks of implicit data acquisition, users are forced to actively participate. In the context of predicting user preferences for items for example, users must explicitly enter ratings. Some users find it difficult to rate items, such as articles, books, movies, products, etc. In this regard, it is expected that predictions made by collaborative filter will improve as more information (e.g., item ratings) is entered. Unfortunately, many users may become frustrated by poor predictions and/or with entering information (e.g., item ratings) before enough information (e.g., item ratings) is gathered to make the predictions made by collaborative filtering systems good. Thus, the collaborative filtering systems which rely on explicitly entered information have a bootstrapping problem. That is, many users will become frustrated with the predictions made by collaborative filtering systems, due, in part, to an initial scarcity of information. As a result of user frustration with initially poor predictions, such users may stop entering information. If this occurs, the predictions made by the collaborative filtering system will probably not improve because users will not provide it with enough information.
Implicitly acquired data does not require active user participation. Unfortunately, however, implicitly acquired information is often considered to be less reliable than information acquired explicitly. For example, one could infer that a user is relatively old if they visit the American Association of Retired Peoples"" (AARP""s) web site often, but an explicit entry of the user""s age is certainly better.
xc2xa71.2.3 Unmet Needs
Given the great utility of recommender systems, particularly in the context of E-commerce, as well as power of collaborative filtering techniques for making good recommendations, the inventors believe that collaborative filtering will be used increasingly. However, it would be useful to mitigate some of the disadvantages of pure memory-based and pure model-based collaborative filtering techniques. That is, it would be useful to provide a collaborative filtering technique that is simple and easy to update as is the case with memory-based systems, while also offering meaningful semantics and explicit assumptions as is the case with model-based systems.
Further, it would be useful to be able to predict the utility of having values (e.g., ratings or votes) for certain attributes (e.g., items). In this way, in the context of gathering data, queries seeking explicit values (e.g., votes or ratings) could be limited to avoid user frustration. That is, values would only be asked for if the benefit (e.g., an improvement to a recommendation) of having such a value would outweigh the cost (e.g., user annoyance) of asking for the value. Further, attributes (e.g., items) whose values (e.g., ratings) add little benefit to the accuracy of the recommendation could be removed from the database (thereby mitigating storage requirements which, under pure memory-based collaborative filtering techniques, are on the order of the number of attributes times the number of users) and/or ignored by the collaborative filtering technique when making a recommendation (thereby mitigating processing time which, under pure memory-based collaborative filtering techniques, are on the order of the number of attributes times the number of users).
The present invention provides new collaborative filtering techniques which meet, at least some of, the heretofore unmet needs introduced in xc2xa71.2.3 above. Basically, a new collaborative filtering technique, referred to as xe2x80x9cpersonality diagnosisxe2x80x9d, that can be seen as a hybrid between memory-based and model-based collaborative filtering techniques, is described. More specifically, using the described personality diagnosis technique, all data may be maintained throughout the processes, new data can be added incrementally, and predictions have meaningful probabilistic semantics. Each user""s reported attribute values (e.g., item ratings or preferences) may be interpreted as a manifestation of their underlying personality type. Personality type may be encoded simply as a vector of the user""s xe2x80x9ctruexe2x80x9d values (e.g., ratings) for attributes (e.g., items) in the database. It may be assumed that users report values (e.g., ratings) with a distributed (e.g., Gaussian) error. Given an active user""s known attribute values (e.g., item ratings), the probability that they have the same personality type as every other user may be determined. Then, the probability that they will have a given value (e.g., rating) for a valueless (e.g., unrated) attribute (e.g., item) may then be determined based on the user""s personality type.
In one embodiment of the present invention, the probabilistic determinations may be used to determine expected value of collecting additional information. Such an expected value of information could be used in at least two ways. First, an interactive recommender could use expected value of information to favorably order queries for attribute values (e.g., item ratings), thereby mollifying what could otherwise be a tedious and frustrating process. Such an value of information computation can balance the costs or difficulty of answering a question about preferences with the expected value of the information being acquired. Beyond ordering the queries to users about preferences, value of information could be used to generate the most valuable n questions to ask should a system designer wish to limit the number of questions asked of users-or accessed from a database of preferences. Second, expected value of information could be used to determine which entries of a database to prune or ignorexe2x80x94that is, which entries, which if removed, would have a minimal effect on the accuracy of recommendations for a population of users.