1. Field of the Invention
The invention relates generally to information retrieval systems, and more particularly, to information retrieval systems for retrieving information in multi-dimensional spaces, using 3-D spatial modeling of semantic entities, including topics and documents.
2. Background of the Invention
Most information systems organize documents into some type of information structure, such as a hierarchical, relational or object oriented database. For example, a file system on a personal computer or UNIX workstation consists of files organized into various topic based files, with documents placed into files by topical indexing. The organizational structure of the hierarchies are typically such that a particular document or set of related documents can be found by accessing the appropriate directory or file.
However, when the number of documents in a database grows to be very large or when the contents of the documents are semantically multi-dimensional, that is each document is about many different topics or subjects, a hierarchical organization becomes an inefficient and cumbersome mechanism. This is because a given document may be properly related to a number of different topics; duplicating the topic for storage in many different topic directories dramatically increases storage requirements for the database, and introduces additional problems of maintaining each of the duplicate copies. Further, finding related documents is difficult and time consuming, since a strict hierarchy prevent efficient linking of related topics.
Finding information in these large hierarchies is further exasperated when the organizational structure of the hierarchy is unknown. Many users are unfamiliar with the hierarchical structure of the database they use, and thereby cannot readily identify documents or topics of interest. Current interfaces provide little support for navigating large complex hierarchies in such databases. This is because the user is typically only given a static sense of where in the database or hierarchy they are searching, with no dynamic presentation of such context as the user changes their queries.
What is needed is the ability to control the amount of information in the hierarchy that is presented at any given time. An effective interface would provide the ability to smoothly control the amount of information presented at any given time and the level of granularity of the information. Even still a hierarchical structure is limited in it's ability to express relationships between intermediate nodes and documents.
Another approach to storage and access of documents is to use a relational structure such as that used by relational databases where the fields in the database contains selected attributes about the document, such as the date published, author, so forth. This is an effective technique for storing documents, and is widely used in document databases.
However, accessing documents in a relational database can be difficult, particularly for novice or occasional users. The typical approach to accessing documents in a relational database is to enter a query using a form, which is then processed, and the results returned to the user as a long list of documents that match the query. This works reasonably well only if the documents are tagged clearly and precisely when entered into the database with the appropriate attribute information that users are interested in, and when the number of items returned from the query is less than about twenty. When the number of documents returned is greater than about twenty or thirty the task of finding the document or set of documents of interest becomes increasingly unmanageable, since the user now has to scan or review these documents to determine if there is a more precise query that can be applied to the database. As a result, the user is required to reformulate a query to narrow a search. This is often a difficult task in that the user must match what they are looking for against what is actually available in the database. Since the user typically does not know what is in the database to begin with this can lead to a very frustrating experience.
For example, if a user looking for articles about "Siberian Huskies," she may type an initial query for information about "dogs." The result returned may by a list of hundreds or thousands of articles that were about dogs. A subsequent refinement may be to search for information about "Siberian Huskies" directly, which may result in zero articles. Such a result does not mean that the database does not have any useful information about "Siberian Huskies". Rather, the system likely does have some useful information for the user, only at this point the user does not know this fact, because she does not know how to specify an appropriate query that will result in a useful and manageable amount of information.
An ideal system would provide the ability to start of with a broad search such as "dogs" and analyze the long list of returned documents and place them in an organizational structure that would allow the user to effectively refine the original board query down to a set of documents that may be applicable to her needs. Such a system would have the structure of a hierarchical system to help find information and, the fluidity of an interface to help control how much information is presented at any given time, while simultaneously providing the flexibility to store complex relational information.
A third way of organizing information is by using a hyperlink to connect documents together. In this approach navigable links from one document to another are stored as part of the originating, or source, document. This technique allows users to effectively follow a train of thought as expressed by the author of the document.
One drawback to this approach is that the user is totally at the mercy of the author and his or her selection of which items of information in a source document to provide links for, and to which documents to target those links. If the author fails to link a source document to another document that is related to the source document, the user will not find that other document through the source document. Further, hyperlinks are typically a one-to-one link relationship, i.e. they only allow one connection from a source to a target document, but not in the opposite direction. Thus, they fail to fully capture the relationship between the documents, and make this relationship fully useful in accessing the documents.
There are several additional problems associated with hyperlinks. One, after the user arrives at the target document, the source context is often lost. These jumps are typically discontinuous requiring the user to re-orient themselves after accessing the target document. For example, if a user is reading a document about Siberian Huskies, and accesses a hyperlinked document about the American Kennel Club, then the context of the source document is lost, and the user is now reading about the AKC. After certain number of these semantic jumps, the user can loose the orientation of where they are in the conceptual structure, and the focus or purpose of their original query. Systems that address these problems would provide some visual cues to help the user maintain context as to where they are in the conceptual structure.
Accordingly, it is desirable to provide an information retrieval system that provides a dynamically constructed respresentation of the context resulting from each query, and which provides this information in a graphical environment where the amount or density of information visually presented to the user is controlled.