The explosive growth of the Internet and email has lead to the explosive growth of the amount of data available. This tremendous amount of data has led to the need to have a tool for classifying, analyzing and organizing the available information as the massive amount of raw information is not meaningful. There are many tools for organizing the information as well as tools for searching through that information. There are many different well known search tools and search engines, such as the very popular Google search engine (www.google.com/). There are also tools designed to permit the user to understand the data collected during the search. Some of the existing tools are textual. For example, a typical search engine might provide search results in a textual form. Alternatively, some tools provide a visual display (and clustering) of the data collected during a search/mining operation. For example, U.S. Pat. No. 5,963,965 describes a system in which the text in a corpus of documents and the relationships between the various words in the corpus of documents is displayed visually in a map-type structure. The map structure permits the user to identify words that appear more often in the corpus of documents. The map also permits the user to drill down through the map structure and, at the lowest level, look at the actual documents that are associated, for example, with a particular word.
Most conventional search engines do not provide an “authoritative search” when a query is entered into the system. In particular, most conventional search engines, such as Google, generate results for a search but do not attempt to apply further processing to understand the data being retrieved or use that further processing to assist with an understanding of the data. The search engine will match the query against an index and return documents that match one or more of the query terms. Typically, the results are organized according to relevance so that the most relevant document, such as the one with the most terms that match the query terms, is presented before the less relevant documents. However, the search engine does not attempt to further analyze the results. The problem with such an approach to search is best illustrated with an example. Assume that a user is looking for documents about John Adams (the composer) and enters the query “John Adams” into the search engine. The search results will likely include documents that contain the words (“John” and “Adams”), but are not actually about the composer entity. Thus, it is desirable to provide a search engine that performs additional processing, to provide a more authoritative search to the user such as the identification and disambiguation of specific named entities.
In addition, most conventional search engines do not consolidate or index content from heterogeneous sources. Nor do these search engines present results that are then ranked according to the relevance of the content from the heterogeneous sources. For example, most conventional search engines do not blend the results of a web-based search with the results of an intranet search so that the results of the user's search include both content from the web that meets the query criteria and content from the intranet that meets the query criteria. It is desirable to provide a search engine that provides this “blending” of content from the heterogeneous sources.
It is also desirable that a search engine provides additional features including a long term archive of search queries and results, vertical content that may be provided with semantic indexing, localization of search results, multimedia display of the search results, mining tools and personalization of the search experience for the user. Thus, it is desirable to provide a search system and method that overcomes the limitations of the conventional systems and provides the desirable features set forth above and it is to this end that the present invention is directed.