The present invention generally relates to information search and retrieval systems. More particularly the present invention relates to implicitly establishing a relative ranking among information objects retrieved as a result of an information search in an information search and retrieval system.
A database is useful only if a desired item can be efficiently found and retrieved therefrom. To locate and retrieve a desired information item in an information database, a search of the database, e.g., based on a keyword or a text string, may be required. The search typically involves finding entries matching the keyword (or string) in an index created from parsing the information items into searchable words and the location in which the word appears in the database. For example, the Internet, or the world wide web (WWW) may be considered as a very large database of information items, in the form of web pages, distributed over a very wide network. Currently available search engines, e.g., the YAHOO(trademark), EXCITE(trademark), and the like, maintain an index of the entire content of the WWW parsed into searchable words and corresponding locations, e.g., the Uniform Resource Locators (URL).
At the conclusion of a search, all matching entries are returned to the user who selects therefrom the one particularly desired information item. Often, however, as the size of a database becomes very large (e.g., the number of web pages in the WWW is currently in the hundreds of millions, and growing fast), a search may return more matching entries than a typical user can ever review in a reasonable time. Thus, even if the search was effective in finding every matching entry, a user must still sift through an excessive number of returned entry to find the one desired information item. This problemxe2x80x94referred to as the xe2x80x9cinformation overloadxe2x80x9d problemxe2x80x94diminishes the usefulness of the database.
Conventional search mechanisms, e.g., a web search engine, attempt to address the above information overload problem by presenting the matching entries in a more useful form thereby making it easier for the user to select therefrom. To this end, typically, each of the matching entries is ranked in terms of its relevance or usefulness. The matching entries are sorted according to, and presented to the user in the order of, the usefulness ranking. Thus, the user is first presented with information items that are purported to be the most useful and relevant. Obviously, the usefulness of the above relevancy rating would be largely dependent on how accurately the ratings can be made.
Conventional methods of relevancy rating rely on explicit feedback from users of the information items, i.e., by requesting the user to explicitly answer at least one question regarding the usefulness or the relevance of the retrieved information. For example, a user may be asked to answer either xe2x80x9cyesxe2x80x9d or xe2x80x9cnoxe2x80x9d to a question xe2x80x9cWas the information helpful?xe2x80x9d. Alternately, the user maybe asked, e.g., to choose from xe2x80x9cvery usefulxe2x80x9d, xe2x80x9csomewhat usefulxe2x80x9d, xe2x80x9cnot usefulxe2x80x9d, and the like. Thus, the accuracy of conventional relevancy ratings depends largely on the explicit inputs from the users of the information items.
Unfortunately, in practice, only a small number (e.g., less than 10 percent) of users even bother to respond to the rating requests, and conventional relevancy ratings are thus often not accurate predictions of the usefulness or the relevance of an information item. Accordingly, in a conventional informational database search, the order in which the retrieved information items are sorted and presented to the user is often nonsensical, and still requires the user to sift through an excessive number of items, and thus fails to effectively address the information overload problem.
Moreover, usefulness or relevance of an informational item may change over time as, for example, the information contained within the item may become outdated. However, once a relatively high relevancy rating is attributed to an informational item, the rated informational item may continue to appear in the earlier portion of the search result presented to the user. That is, a conventional rating method biases the database system to present retrieved information items in the order of a high overall historical rating, but without regard to the datedness of informational items or temporal preference.
Thus, what is needed is an efficient system for and method of rating the usefulness or the relevance of a retrieved informational item without requiring an explicit user feedback.
What is also needed is an efficient system and method for determining a temporally accurate usefulness or relevance rating of a retrieved informational item.
In accordance with the principles of the present invention, a method of, and an apparatus for, temporally updating relevancy ratings of a plurality of informational items in a information retrieval system comprises detecting an access of at least one of the plurality of informational items, the at least one of the plurality of informational items having a most recently accessed time, the most recently accessed time indicating time at which the at least one of the plurality of informational items was lastly accessed, determining an elapsed time since the most recently accessed time, comparing the elapsed time with a predetermined stale access time threshold value, and adjusting a relevancy rating of the at least one of the plurality of informational items if the elapsed time exceeds the predetermined stale access time threshold value.
In addition, in accordance with the principles of the present invention, a computer readable storage medium having stored thereon a computer program for implementing a method of temporally updating relevancy ratings of a plurality of informational items in a information retrieval system, the computer program comprises a set of instructions for detecting an access of at least one of the plurality of informational items, the at least one of the plurality of informational items having a most recently accessed time, the most recently accessed time indicating time at which the at least one of the plurality of informational items was lastly accessed, determining an elapsed time since the most recently accessed time, comparing the elapsed time with a predetermined stale access time threshold value, and adjusting a relevancy rating of the at least one of the plurality of informational items if the elapsed time exceeds the predetermined stale access time threshold value.