Centuries ago the critical information problems were how to record information and how to distribute information. Today, the computer has nearly completely solved these problems. Most information is now recorded directly on computer media and as a result thousands of pages of information are moved around the world in seconds via electronic data networks. In the process of solving information dissemination problems, however, people have overwhelmed themselves with the sheer volume of available information. The critical question is how to benefit from the richness of the available information without getting bogged down by the overwhelming volume.
One possibility is to make use of the opinions each person forms when perusing any piece of information. Taken together, the web of all of these opinions is a rich resource that could be used to sift through the available information for nuggets of value. This technique is already applied informally, through word-of-mouth in the physical world, and through forwarded mail, news, and uniform resource locators (URLs) in the virtual world. However, these informal processes are not powerful enough to deal with the millions of new documents being created every week. Computers helped create this problem; perhaps they can help solve it. A need exists for a solution that gathers this collective wisdom more formally, and applies it to the problem of selecting which of the available documents will be valuable to each person, individually.
These principles have been applied in one area of research, known as collaborative filtering. Collaborative filtering seeks to understand the relationships between people, and to use those relationships to help people meet information their needs more effectively. Ratings are entered by the user to indicate his or her opinion of the document to the collaborative filtering system. Based on previously entered ratings by other users, predictions are made for a user of the value of an item to that user. Ratings often represent the user's evaluation of the document along one or more dimensions. There are many possible dimensions, including overall enjoyment, value to the task at hand, interest in the topic, reputation of the author, appropriateness for the context, quality of writing, and amount of new material versus repeated material. Ratings along each of these dimensions can be either explicit, requiring special user interaction, or implicit, captured from ordinary user actions.
The most common explicit rating methods in collaborative filtering systems are single keystrokes entered by users. The keystrokes usually represent values along a single ordered dimension, discretized for ease-of-entry. Ratings can also be entered through graphical sliders, which are similar, except that they often support more possible values. Another common rating method is textual ratings. Textual ratings are either keyword or free-form. Keyword textual ratings often focus on characterizing the topic. Keyword textual ratings that focus on measuring the quality are very similar to keystroke ratings. Free-form textual ratings can be valuable for users, but are difficult to process automatically. Free-form textual ratings are more common in domains in which the total number of documents is relatively low, so users can peruse a substantial fraction of them.
Implicit ratings are collected by non-intrusively watching the user read a document. Observations about what the user does with the document may lead to insights into the value of the document to the user. For instance, if a user reads the title or abstract of a document, but chooses not to read the document, that may indicate low interest in the topic of the document. On the other hand, if the user chooses to save a document to a file, or to forward it to a colleague, that may indicate higher interest in the document. The time that a user spends reading a document (time spent reading) is another implicit rating. Intuitively, users are likely to spend longer with documents they find valuable than with documents they find uninteresting.
Collaborative filtering systems have largely focused on explicit ratings. In small tightly focused groups with substantial shared interests, textual ratings have proven valuable. However, in larger groups with more diverse interests, automatic computation of personalized predictions would benefit from a more structured ratings system.
In a system using explicit ratings, the user answers each question with a keystroke or other indication of preference. The system uses the user's answer to this question to influence its prediction algorithms for this user in the future. Users can informally combine their ratings along any of the possible ratings dimensions to create this single rating. Existing prediction algorithms do a good job of making predictions for users based on explicit ratings along this single dimension.
Although explicit ratings have worked well in practice, there are some significant advantages to implicit ratings. Most importantly, an implicit rating requires no effort on the part of the user, making the collaborative filtering system zero cost to users. This overcomes the problem of user resistance to using a collaborative filtering system. One source of this resistance is the fact that the collaborative filtering system returns little or no value to a user until the user has rated dozens of documents, thus generating enough information for the correlation algorithm to create a correlation group. Another source of resistance is that the rating of documents can be difficult for users to learn how to do. The two sources for this difficulty are learning the interface, and learning to create mental ratings judgments while reading documents.
In contrast, implicit ratings would incur no cost for the user to try the system, and would have no learning curve for either the interface or for creating the ratings. These advantages may mean that implicit ratings would lead to more users, which in turn would lead to more effective correlations and predictions, potentially creating a positive feedback loop. Another advantage is that since implicit ratings do not require any interaction from users, and are fast enough to be transparent to users, they do not induce any hidden effects in how users read documents. For instance, the act of creating an explicit rating might change a user's reading style, changing the total value they receive from the system. On the other hand, an implicit rating is unlikely to change a user's reading style because the user does not do anything different to produce the rating. However, if predictions generated from implicit ratings are presented to the user they may change their reading style.
Therefore, implicit ratings will be valuable if they can be effectively generated from implicit measurements of user behavior. One of the problems of using implicit measurements is finding a way to convert these measures into ratings in a way that leads to effective predictions.
The present invention provides a solution to this and other problems, and offers other advantages over the prior art.