The present invention relates to a method and apparatus for presenting a plurality of objects in a ranking order according to a user""s preferences with respect to those objects.
Society is experiencing explosive growth in the amount of information available through electronic media. The challenges raised by this growth are many, however, one of the most significant, if not the most significant challenge faced by the information technology industry is being able to process, organize and present this mass of information to users in an efficient manner. In most cases it is possible to treat the information as arbitrary types of objects, i.e., objects of related data treated as a unit. Examples of these types of objects include electronic mail messages within an e-mail system, documents in a document organization system, the results of a list of documents returned by a search engine or perhaps the information retrieved as a result of queries processed against a database.
Attempts have been made to address this challenge by presenting the set of objects, not in an arbitrary order, but instead in an order allowing a potential user to concentrate on xe2x80x9cmore importantxe2x80x9d objects while skimming through, or perhaps even neglecting, the xe2x80x9cless importantxe2x80x9d objects.
Various technologies have been proposed for the classification and ranking of objects. For instance the concept of an urgency flag as utilized in present day e-mail systems is well known. An urgency flag is an indication attached to an object representing the importance of that object for the addressee. A disadvantage of this approach is that the degree of urgency is determined solely by the author, thus ignoring the importance that a reader/addressee may attach to the information. xe2x80x9cJunkxe2x80x9d mail would be an example where the degrees of urgency between sender and receiver could vary greatly. Moreover, this technique does not easily allow multiple recipients of the information to be treated differently with respect to the urgency of the message (perhaps only by using different mailing tasks could this flexibility be accomplished).
Another common ranking/classification technique is that used in present day search engines. Typically these engines sort a list of search xe2x80x9chitsxe2x80x9d based solely upon the search pattern that is entered into the engine. Because these search engines ignore user preferences that exist amongst the various objects retrieved, such ranking techniques have the disadvantage of supplying a user with xe2x80x9chitsxe2x80x9d that may match the search pattern, but are not within the context of the user""s query.
Agent techniques are yet another commonly utilized method of preprocessing information, such as the filtering of unwanted incoming e-mail messages. Although somewhat effective, these filtering techniques have the disadvantage of requiring that the filter criteria be explicitly specified for each potential user of the information.
Sort options, such as the sorting of e-mail messages by author, subject, or other specified fields, are commonly applied in today""s various systems and products. While these techniques do generally offer a rough ordering scheme, typically none of the offered sort criteria perfectly maintains a user""s preferences with respect to the information being managed.
Data mining and text mining technologies exploit cluster techniques in order to segment documents into groups whose members share some common group characteristic. Again, however, the clusters typically employed in today""s systems do not usually reflect a user""s particular preferences with respect to the information being managed.
Another classification technique used today is that which processes information by first executing some type of training sequence on the data. An example is a system that automatically transfers received e-mail messages into predefined folders. While effective, these systems typically require a significant amount of up-front training before they can effectively process a set unknown of objects. Furthermore, this time consuming training sequence must be re-executed whenever it becomes outdated as a result of the need to process new types of information. Thus, the technique is rather inefficient and the quality of the resulting classification is limited by the robustness of the training set.
Finally, an approach for optimizing the ranking order of a set of objects based on user preferences is proposed in D. E. Rose et al., U.S. Pat. No. 5,724,567. The method disclosed compares the results of content based ranking algorithms (such as those employed in search engines) and/or collaborative filtering techniques (which are based on explicit input from other users of the information) to a user profile in order to generate the ultimate ranking order. The user profile is created using a relevance-feedback approach which requires users to enter information into the system regarding the relative importance of the information being processed. A drawback of this approach is that the resultant ranking of information is only as accurate as the feedback provided by the users of the information. Furthermore, collecting this added information requires users to explicitly enter feedback in response to system queries which detracts from the overall use of the application.
It is therefore a purpose of this invention to provide a method and apparatus for presenting a plurality of objects in a ranking order reflecting a user""s preferences with respect to those objects, while easing and improving the task of describing the characteristics of the user preferences upon which the ranking order will be based.
The invention relates to a computerized method of presenting a plurality of objects in a ranking order. The objects are presented in a ranked order according to a calculated object preference. Object preferences are determined using a preference model that is based upon a user""s access actions to a group of objects. This preference model is adaptively developed using the information resources associated with a user""s normal interaction with the group of objects being ranked. Because the information gathered regarding object preferences is implicit to normal user activities, the adaptive development of the preference model and continual recalculation of object preferences is completely transparent to the user.
This approach offers advantages in productivity and ease-of-use over methods that require users to explicitly enter ranking information into the system during a so-called training phase. These types of applications, such as the collaborative filtering method discussed above, can often require users to invest more time and effort in training than the benefit they can expect to receive in the form of object organization. Moreover, the proposed method utilizes the most reliable information available to determine a user""s preferences with respect to a group of objects, namely, the user""s own access patterns to that particular set of objects.
Reliance on a user""s opinion of the xe2x80x9cimportancexe2x80x9d of an object as opposed to their xe2x80x9cpreferencexe2x80x9d for that object, as determined through actual patterns of use, often leads to misleading results. For example, an xe2x80x9cout-of-the-officexe2x80x9d message may certainly be important at the time of receipt, however, an analysis of the user""s access patterns would likely yield that their preference for accessing this type of information is actually quite low. In addition, using implicit information resources can yield preferences for certain types of objects that a user may not even be aware exist. Such preferences would be ignored in a system relying solely on explicit importance scores to determine object preferences, thus exemplifying the advantages of the proposed method and apparatus.
The foregoing and other purposes, aspects and advantages of the invention will be better understood from the following detailed description with reference to the drawing, in which a general flow chart of a preferred embodiment of the invention is shown.
The description of the present invention uses the term xe2x80x9cobjectxe2x80x9d in its most general meaning, namely that of representing any related data that is treated as a unit. The terms xe2x80x9cobjectxe2x80x9d and xe2x80x9cdocumentxe2x80x9d are used interchangeably throughout the specification. Also, the present invention is being described in the context of objects that are electronic mail messages (e-mail) within an e-mail system for purposes of illustration only. Those skilled in the art will appreciate that the proposed invention is applicable to any other object system without limitation.
Moreover, the term xe2x80x9cuserxe2x80x9d is not necessarily limited to a human user, although this often will be the case. Indeed, the proposed method could also be applied to a group of users, or perhaps to an automated user. In addition, while the proposed method focuses on the use of implicit user information, this type of information could be used in combination with explicit user preference feedback in order to provide the capability of explicitly forcing changes to a user""s object preference model.
It is has been observed that users resist using ranking technologies that require explicit entry of object preference feedback information in order to xe2x80x9ctrainxe2x80x9d their database systems. Furthermore, the most reliable source of information for determining object preference is the manner in which a user accesses a group of objects, not the user""s own opinion of the importance the objects within a particular group. Thus, there exists a need in today""s document management systems for sorting functions that are capable of operating on implicit, rather than explicit, object preference information resources.
To this end, the method and apparatus of the preferred embodiment requires that no additional explicit information be entered by a user in order to determine a user""s preference with respect to a group of objects. Instead, implicit user information is derived from the actual handling and processing of information objects in order to develop a user preference model with respect to those objects. In particular, the order in which a user processes a group of documents is utilized. By keeping track of this order, an internal model can be constructed which represents the preferences of a user for a particular group of objects (referred to as the preference model). Over time, the system can adapt to the user""s object preferences and will present newly received objects in a ranking order that is determined by the adaptive preference model.
To monitor the order in which a user processes a list of documents, one can measure the deviation between the expected/presented order (known as the access hypothesis) and the actual order in which the document was accessed by the user. A number can be computed to represent this deviation which can then be tested in order to determine whether a particular document is more or less attractive to the user. For example, a system could be designed such that the greater the measured deviation between the actual and expected order position, the more attractive the document is deemed to be to the user. Thus, positive differences would represent attractive documents, negative differences unattractive documents and a document having a difference of zero would be deemed to be of neutral interest.
By using a classification algorithm such as a naive Bayesian classifier, a system could learn to estimate the access hypothesis from the various attributes of the document. That is to say, any part of an object""s contents can be utilized as the characteristic feature upon which the user""s preference could be based. In the case of e-mail messages, these attributes can be the author""s name, the length of the document, the age of the document, the persons listed in the to-list and/or cc-list, the length of the to-list and/or cc-list, the position of a user""s name in the to-list and/or cc-list, a list of words occurring in a document using counts or in combination with a xe2x80x9cstop-word-listxe2x80x9d (i.e. a list of words to be ignored) or perhaps the language of the document itself. Any number of combinations are possible. In addition, weighting factors can be used to increase or decrease the emphasis placed on certain attributes (e.g. to place a higher emphasis on the xe2x80x9csubject linexe2x80x9d of an e-mail message) in the ranking.
Once the algorithm has associated the features or attributes of a document to a number representing its xe2x80x9cattractivenessxe2x80x9d to the user, the algorithm can then reproduce this number for newly received like documents as well as predict an attractiveness for never-before-seen documents. Upon computing an attractiveness for a set of received documents, the order of the documents can be rearranged and presented to the user with documents having the highest value of attractiveness listed first. Assuming that the observable attributes of a document are in some way related to its attractiveness, what will result is a group of documents being presented to a user in an improved sort order, having the most attractive documents listed first according to a preference model that is based upon that user""s specific patterns of document use.
This methodology can be applied iteratively to the documents within a group to be ordered, wherein after each iteration a modified sort order is recalculated by the algorithm, such that changes in the underlying preference model can be supported and such that the effect of over-fitting the data can be avoided.
Referring to the drawing, the following describes the various phases executed by the proposed ranking methodology of the preferred embodiment.
1. The Initial State
The system is ready to start from scratch 100.
2. Present First Set of Documents
After detecting new documents have arrived 102-104, N new documents 106 are presented 108 in the usual state-of-the-art fashion, such as being ordered by date, arrival time, author, . . .
3. Observation of the User""s Selections
The system keeps track of how the users deviates from the presented order 110 by monitoring the user""s selections. The system calculates the deviation between an object""s actual access order and its corresponding access hypothesis as determined by the current preference model 112.
4. Creation of the Preference Model
A model 112 reflecting the user""s preferences is constructed 114 based upon key features of the selected documents.
5. Wait for Further Documents to Arrive
The system waits until new documents have arrived to be presented 116-118.
6. Computation of the Assumed Sort Order
Each new document in the group is assigned a value 122 according to the preference model 114. The M new documents received 120 are then sorted according to the computed value 122 and displayed to the user in order of preference 124.
7. Observation of the User""s Selections.
Once again, the system monitors 110 how the user""s preferences differ from the order predicted by the preference model 112.
8. Update of the Preferences Model
The preference model 112 is then updated 114 based upon these observed deviations.
9. Continual Update of the Preference Model
The preference model is continually adaptively updated by the system as new documents are received, ordered, presented and then subsequently processed by the user.