The present invention relates to the field of electronic data storage and retrieval. More specifically, the present invention relates to selecting categories of information in an Internet search engine database, based on a user""s query.
The Internet is a vast store of information, permitting access to tens of millions of web sites. Although the ever-increasing number of web sites are creating record access to information, the sheer number of sites available make it difficult for an Internet user to locate desired information. As a result, Internet search engines have become a necessary and valuable tool for locating information on the Internet
Not all search engines employ the same search strategy. Some Internet search engines return a xe2x80x9cflatxe2x80x9d list of results indexed according to a web site""s similarity to a user""s query. Although these lists are useful, the vast expanse of the Internet has reduced their effectiveness. Other Internet search engines take a different approach and catalog individual web sites into hierarchical taxonomies of categories based on the site""s content. These category-based search engines not only return the most relevant web sites, but also lists of matching categories that describe and encompass relevant web sites in order to help users focus their query. In addition, these category-based search engines often display the matching category along with their hierarchically related categories in xe2x80x9ccategory pathsxe2x80x9d in order to place the matching category in a proper context.
For some time, these categorization techniques were sufficient to provide search engine users with intelligible suggestions. However, just as previous Internet growth necessitated categorization over xe2x80x9cflatxe2x80x9d result lists, the Internet""s recent exponential growth has limited the effectiveness of these taxonomy techniques. In particular, the rapid growth of the Internet has caused a corresponding expansion in the number of categories found in today""s search engine taxonomies. As a result, simply categorizing web sites and displaying category paths has become unwieldy and unintelligible, often resulting in tens or hundreds of returned category paths. Moreover, the lack of differentiation among the many returned category paths often results in logical redundancies and even irrelevant search results. Therefore, although today""s category paths are better than their predecessor xe2x80x9cflatxe2x80x9d lists of web sites, they have been rendered ineffective by the Internet""s exponential growth.
Therefore, it would be beneficial to provide a logical distinction among the many possible matching categories and their category paths. In this way, the user can more easily sort through the returned results and more quickly focus the search to obtain the desired results.
Large stores of information are often organized in a hierarchical taxonomy to aid a search and retrieval of the information. The hierarchical taxonomy generally consists of related categories of information, called xe2x80x9cnodes,xe2x80x9d that each may contain information relevant to the search. Each node is addressable according to its path in the hierarchical taxonomy. In information stores where the number of nodes having relevant information is extremely large, such as the Internet, providing a cohesive, intelligent, and organized display of the search results becomes extremely important to the success of a user traversing the store to find relevant information. The invention provides such search results by ranking each node of the taxonomy to determine which nodes are most likely to be relevant to the search request. The invention then creates a conceptually-related xe2x80x9cclusterxe2x80x9d of nodes by selecting a relevant xe2x80x9cseedxe2x80x9d node and relevant nodes related to the xe2x80x9cseedxe2x80x9d node.
More specifically, the invention provides a method, system and computer-readable medium for selecting nodes in a hierarchical taxonomy. The method comprises the steps of receiving a query and comparing characteristics of the nodes with the query. The method ranks the nodes based on a predetermined criteria and selects a first node based on the ranking. The predetermined criteria may be based on common structure between the nodes and the query. Also, based on the ranking, the method selects one or more nodes that are hierarchically related to the first node. The method may then display the first node and the related nodes in such a way that the relation of the nodes is apparent. The method may be repeated to select additional relevant nodes and their related nodes.