Information retrieval systems and associated methods search and retrieve information in response to user search queries. As a result of any given search, vast amounts of data may be retrieved. These data may include structured and unstructured data, free text, tagged data, metadata, audio imagery, and motion imagery (video), for example. To compound the problem, information retrieval systems are searching larger volumes of information every year. A study conducted by the University of California at Berkley concluded that the production of new information has nearly doubled between 1999 and 2002.
When an information retrieval system performs a search in response to a user search query, the user may be overwhelmed with the results. For example, a typical search provides the user with hundreds and even thousands of items. The retrieved information includes both relevant and irrelevant information. The user now has the burden of determining the relevant information from the irrelevant information.
One approach to this problem is to build a taxonomy. A taxonomy is an orderly classification scheme of dividing a broad topic into a number of predefined categories, with the categories being divided into sub-categories. This allows a user to navigate through the available data to find relevant information while at the same time limiting the documents to be searched. However, creating a taxonomy and identifying the documents with the correct classification is very time consuming. Moreover, a taxonomy requires continued maintenance to categorize new information as it becomes available U.S. Pat. No. 6,938,046 discloses a taxonomy that includes polyarchical coding, which involves using multiple higher level codes applied to the same lower level code. The polyarchical coding means that, for example, a coder need only enter one lower level code for a piece of data and the higher level polyarchical codes are automatically applied.
Another approach is to use an information retrieval system that groups the results to assist the user. For example, the Vivisimo Clustering Engine™ made by Vivisimo, Inc. of Pittsburg, Pa., automatically organizes search results into meaningful hierarchical folders on-the-fly. As the information is retrieved, it is clustered into categories that are intelligently selected from the words and phrases contained in the search results themselves. In particular, the Vivisimo Clustering Engine™ uses only the returned title and abstract for each result. The similarity between documents is based on this raw material (i.e., the visible text of the search result and not the entire article) and nothing else. The documents are then clustered together based on textual similarity. However, this raw similarity is augmented with human knowledge of what users wish to see when they examine clustered documents. This results in the categories being up-to-date and fresh as the contents therein
Visual navigational search approaches are provided in U.S. Pat. Nos. 6,574,632 and 6,701,318 to Fox et al., the contents of which are hereby incorporated herein by reference. Fox et al. discloses an information retrieval and visualization system utilizing multiple search engines for retrieving documents from a document database based upon user input queries. Each search engine produces a common mathematical representation of each retrieved document. The retrieved documents are then combined and ranked. A mathematical representation for each respective document is mapped onto a display. Information displayed includes a three-dimensional display of keywords from the user input query. The three-dimensional visualization capability based upon the mathematical representation of information within the information retrieval and visualization system provides users with an intuitive understanding, with relevance feedback/query refinement techniques that can be better utilized, resulting in higher retrieval accuracy.
Despite the continuing development of search engines and result visualization techniques, there is still a need to quickly and efficiently group together similar documents in a document database to present search results to the user in a meaningful manner.