The invention relates generally to techniques for analyzing queries submitted to databases. More particularly, the invention provides techniques to retrieve, rank and display selected data objects using a ranking scheme based on each object's textual relevance to the query and any linking relationships that exist between the various retrieved data objects.
As the size of the World-Wide Web (the “Web”) has increased, so has its importance as a data repository. It is currently estimated that the Web comprises approximately 150 million hosts and more than two billion web pages and is growing at a rate of approximately 100% per year. One aspect of this growth is that users can no longer browse multiple sources for the same or related information—there is simply to much of it. Thus, any search and retrieval technique applied to such a large and highly interconnected database must return only relevant results. The more relevant the returned results, the “better” the search.
Current search engines use a variety of techniques to determine what retrieved objects (e.g., documents) are relevant and which are not. For example, documents can be ranked based on (1) how many times a user's search terms appear in the document, and/or (2) how close the search terms are to the beginning of the document, and/or (3) the presence or absence of the search terms in the document's title or other specified sections. More recent search engines assign a rank for each page identified by a search based on a vector-space analysis scheme. Such schemes cluster groups of retrieved pages based on the number of references those pages receive (in-bound links) and/or the number of pages those pages reference (out-bound links). Recent improvements of these basic techniques assign a rank value to each page in terms of both the number of in-bound links it has and the importance of the pages providing those in-bound links (i.e., the quality of the out-bound links from predecessor documents). The “Google” search engine at http://www.google.com is one search engine employing this method.
While these techniques provide ranking metrics that are an improvement over prior text only weighting methods, they are typically static (that is, they are computed a priori and, as a result, are not able to address the variety of queries submitted by real users). Thus, it would be beneficial to provide a mechanism to dynamically rank a retrieved data object based on its textual relevancy to the submitted query and its interconnectivity to other retrieved data objects.