Information retrieval systems and associated methods search and retrieve information in response to user search queries. As a result of any given search, vast amounts of data may be retrieved. These data may include structured and unstructured data, free text, tagged data, metadata, audio imagery, and motion imagery (video), for example. To compound the problem, information retrieval systems are searching larger volumes of information every year. A study conducted by the University of California at Berkley concluded that the production of new information has nearly doubled between 1999 and 2002.
When an information retrieval system performs a search in response to a user search query, the user may be overwhelmed with the results. For example, a typical search provides the user with hundreds and even thousands of items. The retrieved information includes both relevant and irrelevant information. The user now has the burden of determining the relevant information from the irrelevant information.
One approach to this problem is to build a taxonomy. A taxonomy is an orderly classification scheme of dividing a broad topic into a number of predefined categories, with the categories being divided into sub-categories. This allows a user to navigate through the available data to find relevant information while at the same time limiting the documents to be searched. However, creating a taxonomy and identifying the documents with the correct classification is very time consuming. Moreover, a taxonomy requires continued maintenance to categorize new information as it becomes available.
Another approach is to use an information retrieval system that groups the results to assist the user. For example, the Vivisimo Clustering Engine™ automatically organizes search results into meaningful hierarchical folders on-the-fly. As the information is retrieved, it is clustered into categories that are intelligently selected from the words and phrases contained in the search results themselves. This results in the categories being up-to-date and fresh as the contents therein.
Visual navigational search approaches are provided in U.S. Pat. Nos. 6,574,632 and 6,701,318 to Fox et al., the contents of which are hereby incorporated herein by reference. Fox et al. discloses an information retrieval and visualization system utilizing multiple search engines for retrieving documents from a document database based upon user input queries. Each search engine produces a common mathematical representation of each retrieved document. The retrieved documents are then combined and ranked. A mathematical representation for each respective document is mapped onto a display. Information displayed includes a three-dimensional display of keywords from the user input query. The three-dimensional visualization capability based upon the mathematical representation of information within the information retrieval and visualization system provides users with an intuitive understanding, with relevance feedback/query refinement techniques that can be better utilized, resulting in higher retrieval accuracy.
Despite the continuing development of search engines and result visualization techniques, there is still a need to quickly and efficiently search large document collections and present the results in a meaningful manner to the user.
This is particularly true when analyzing multi-lingual documents. For instance, analysts typically operate in a time critical environment that is both multicultural and multi-lingual. The volumes of data that need to be analyzed are growing at ever increasing rates. Analysts generally lack the time and many lack the capability to analyze multi-lingual data. Consequently, there is also a need to quickly and efficiently search large document collections containing multi-lingual information and present the results in a meaningful manner to the user.