The invention relates generally to techniques for analyzing database queries. More particularly, the invention provides techniques to assign a rank, weight or strength metric to data objects based on the object's membership in multiple groups.
As the size of the World-Wide Web (the “Web”) has increased, so has its importance as a data repository. It is currently estimated that the Web comprises approximately 150 million hosts and more than two billion web pages and is growing at a rate of approximately 100% per year. One aspect of this growth is that users can no longer browse multiple sources for the same or related information—there is simply to much of it. Thus, any search and retrieval technique applied to such a large and highly interconnected database must return only relevant results. The more relevant the returned results, the “better” the search.
Current search engines use a variety of techniques to determine what retrieved objects (e.g., documents) are relevant and which are not. For example, documents can be ranked based on (1) how many times a user's search terms appear in the document, and/or (2) how close the search terms are to the beginning of the document, and/or (3) the presence or absence of the search terms in the document's title or other specified headings. More recent search engines assign a rank for each page (that is, each page identified by a search) based on a vector space analysis scheme. Such schemes cluster groups of retrieved pages based on the number of references those pages receive (in-bound links) and/or the number of pages those pages reference (out-bound links). Recent improvements of these basic techniques assign a rank value to each page in terms of both the number of in-bound links it has and the importance of the pages providing those in-bound links (i.e., the quality of the out-bound links from predecessor documents). The “Google” search engine at http://www.google.com is one search engine employing this method.
While these techniques provide ranking metrics that are an improvement over prior text weighting methods, they are typically static (that is, they are computed a priori) and fail to account for the importance of documents that participate in multiple groups. Thus, it would be beneficial to provide a mechanism to dynamically determine the relevancy of a retrieved data object based not only on its membership in one group, but to account for its importance as a result of its membership in multiple groups.