Computer networks used for knowledge searching and retrieval are well known. For example, see U.S. Pat. No. 5,873,080 to Coden et al., issued on Feb. 16, 1999; U.S. Pat. No. 5,875,446 to Brown et al., issued on Feb. 23, 1999; U.S. Pat. No. 5,913,208 to Brown et al., issued Jun. 15, 1999; and U.S. Pat. No. 5,819,265 to Ravin et al., issued on Oct. 6, 1998, the disclosures of which are hereby incorporated by reference. In all of these patents, a primary purpose is to provide better methods to retrieve relevant documents in response to user queries.
In order to improve relevancy, information is categorized into groups of subjects. The categorization of the text documents is one of the most effective methods to help users organize information. In general, the content of the text document is analyzed to determine the words and phrases that contribute to the context of the document. The document is then associated, based on the context and content, to one or more categories of a given taxonomy. Once the document is associated with a category of a taxonomy, the users can easily navigate within the taxonomy of their interest to find relevant information. The “Yahoo!” search Web site is designed based on this concept of categorization. Each document is listed under one category or subcategory.
Once the information is categorized based on a taxonomy, the users can narrow their search scope within a category or subcategory. This way they can increase the relevance of the documents that they retrieved. Today, most search Web sites are based on this principle.
The context of the query in general is very important to be able to return the relevant results. As an example, if the word Java is used with the context of a coffee type, then this information must be communicated to the search engine. Otherwise, the search engine would return results out of context such as references to a computer language or a germ or a lyric. One way to associate a context to a query is to look at a user profile. A user profile contains a set of categories or a taxonomy that identifies user interests. When the user forms a query, it can be associated to one or more categories, which helps to determine and possibly expand the context of the query.
The amount of information published over the Internet grew so rapidly that it became very difficult to find the information. In order to make the searching of Internet content more practical, categorization of the Web content was proposed. The unstructured Web content was categorized by using specific taxonomies. Today, for many search engines, users are expected to know and select the category for the information that they look for. In general, the categories are organized in a tree structure. There are seven to 15 main categories, such as Art, Business, Computers, Education, Entertainment, and the other subcategories are organized under these main categories. In order to cover the information space properly, seven to 15,000 subcategories were proposed.
For a user who has a very specific area of interest, these categories are less than useful at times. For instance, a category that is useful to a particular user may be either too specific to belong to a general taxonomy or will make navigation difficult for the user, as the user might have to navigate through much of a taxonomy tree. As an example, the category “Think Pad Model 600” is a very specific category which is not part of a general taxonomy. One other problem of generalized taxonomy trees is the fact that these trees are ever growing and constantly need pruning. Consequently, new categories are added in time, and old categories are deleted. Users are expected to keep up with the changing taxonomy trees as they perform their searches. Yet another problem is that not all users are familiar with the categorization scheme. It takes an effort on behalf of the users to navigate through the taxonomy tree and find the information searched for within that category. This can cause, among other problems, a lack of returned information.
Metasearch systems help to alleviate the problem of insufficient returned information. A metasearch system is not a search engine but a system that merges results from a multitude of search engines. Thus, in the case of metasearch systems, a query is sent to a multitude of information sources and the results are grouped and merged. While a metasearch system retrieves more information, at the same time, the amount of returned information can be overwhelming.
Therefore, a need still exists for allowing users to search through a massive amount of information, yet provide users with more meaningful results than currently presented to the user when searching for information.