1. Field of Invention
The present invention relates generally to information retrieval systems and methods, and more particularly, to the dynamic organization of content retrieved in response to user input queries.
2. Background of the Invention
Conventional information retrieval systems typically allow for one of two types of query paradigms, either topic navigation, or full text retrieval, or a limited combination of both. In a full text retrieval system, queries containing any keywords are processed to produce documents or other content which contains these keywords (or their synomyns, and other variants) or that otherwise best satisfy the query. Typically, the output content is organized in as a simple list, arranged either alphabetically, chronologically, or both some other sort criteria. These types of information retrieval systems are common in every type of information domain, such as document management systems, library catalogs, search engines for the World Wide Web, relational databases, and the like.
The problem with this type of query and retrieval paradigm is that it fails to provide to the user a useful arrangement of the returned set of documents and content in terms of the meaning or nature of the content itself. More particularly, it fails to organize the content according to a set of topics pertinent to the returned content. The lack of a topic organization makes it difficult for the user to evaluate the overall query results, and to further navigate or explore the search results for content of interest. This problem is especially significant when dealing with novice or casual users of an information database. These users are unlikely to specify their queries with a high degree of precision, and are also unlikely to know the range and variety of different types of documents available in the database. The absence of a topic arrangement of query results makes it difficult for such users to explore both the documents that satisfy the query, and other documents which may be of interest but which did not satisfy the original query. At best, full text systems allows the user to refine or generalize the query by conjoining or disjoining additional keywords to the original query. However, the problem remains that the resulting documents will have no topic arrangement.
To overcome these types of problems, topic based query systems have been employed. In a topic system, a collection of documents is organized under a hierarchy of topics and subtopics. Each topic is associated with a number of documents that are about that topic. The user navigates the topic hierarchy in a strictly linear fashion from topic to subtopic. When a topic of interest is found, the user can review the documents associated with that topic.
The problem with this type of information retrieval system is that the selection of topics is unlikely to include topics that match every users' potential interests. In particular, users often search for documents that satisfy two or more unrelated concepts which have no equivalent topic in the topic hierarchy. For example, a general purpose document collection may contain groups of topics such as:
Topic Subtopics . . . Art American Ancient Art Asian . . . Museums America Asia Europe Louvre . . . and Animals Mammals Insects Reptiles Crocodiles Frogs Snakes . . .
Each of these topics would be is associated with its own set of documents, which may or may not overlap with the documents associated with other topics. The user is typically constrained to view documents under a single topic at a time. However, the user may have an interest in finding documents that are about both art museums and snakes. Since the topic hierarchy does not contain this precise intersection of topics, the user is unable to easily locate documents of interest, and must instead review all of the documents associated with "museums" and separately all of the documents associated with "snakes" to determine if any of them match this particular combination of topics.
One reason for this deficiency of conventional topic based systems is that the user is unable to specify a query which is the intersection of multiple topics in the topic hierarchy. For a topic hierarchy containing N topics, the possible number of topic intersections is N!. Since the more useful topic hierarchies will have hundreds or thousands of topics, it is computationally infeasible to determine a priori every possible topic intersection to determine which documents are associated each intersection.
Other systems provide a combination of topic and full text retrieval. In these systems, a full text query is processed to identify various topics in the topic hierarchy that match the query, or portions of it, and these topics and their documents are displayed to the user. However, if the located topics are not actually what the user is interested in, then a new query must be specified, and the process repeated. The user has no ability to modify the topics of the query directly to obtain a more refined intersection of topics, again due to the problem of the large number of topic intersections.
Accordingly, it is desirable to provide a system and method of query analysis and information retrieval that dynamically generates a topic organization of the content located in response to a user query, allowing for navigation and exploration of that content. Further, it is desirable to provide a system that offers the flexibility of full text retrieval in its ability to generalize and refine a search, and the organizational benefits of navigation and querying in a topic hierarchy.