Information retrieval from a database of information is an increasingly challenging problem, particularly on the World Wide Web (WWW), as increased computing power and networking infrastructure allow the aggregation of large amounts of information and widespread access to that information. A goal of the information retrieval process is to allow the identification of materials of interest to users. The information provider, user, or a third party may also desire to supplement or otherwise manipulate the content presented to the user.
As the number of materials that users may search and navigate increases, identifying relevant materials becomes increasingly important, but also increasingly difficult. Challenges posed by the information retrieval process include providing an intuitive, flexible user interface and completely and accurately identifying materials relevant to the user's needs within a reasonable amount of time. Another challenge is to provide an implementation of this user interface that is highly scalable, so that it can readily be applied to the increasing amounts of information and demands to access that information. The information retrieval process comprehends two interrelated technical aspects, namely, information organization and access.
As the number of materials that users may search and navigate increases, manipulating content presentation also becomes more difficult. It becomes impractical to manipulate content by manually designing every possible view of the material presented to the users as they maneuver through the information retrieval process. In order to perform this design process efficiently, there is a need for a system that allows a party, such as the information provider, to specify what content is presented to users in response to a query, as well as how that content is presented with more flexibility.
Current information search and navigation systems usually follow one of three paradigms. One type of information search and navigation system employs a database query system. In a typical database query system, a user formulates a structured query by specifying values for fixed data fields, and the system enumerates the documents whose data fields contain those values. PriceSCAN.com uses such an interface, for example. Generally, a database query system presents users with a form-based interface, converts the form input into a query in a formal database language, such as SQL, and then executes the query on a relational database management system. Disadvantages of typical query-based systems include that they allow users to make queries that return no documents and that they offer query modification options that lead only to further restriction of the result set (the documents that correspond to the user's specifications), rather than to expansion or extension of the result set. In addition, database query systems typically exhibit poor performance for large data sets or heavy access loads; they are often optimized for processing transactions rather than queries.
A second type of information search and navigation system is a free-text search engine. In a typical free-text search engine, the user enters an arbitrary text string, often in the form of a Boolean expression, and the system responds by enumerating the documents that contain matching text. Google.com, for example, includes a free-text search engine. Generally a free-text search engine presents users with a search form, often a single line, and processes queries using a precomputed index. Generally this index associates each document with a large portion of the words contained in that document, without substantive consideration of the document's content. Accordingly, the result set is often a voluminous, disorganized list that mixes relevant and irrelevant documents. Although variations have been developed that attempt to determine the objective of the user's query and to provide relevance rankings to the result set or to otherwise narrow or organize the result set, these systems are limited and unreliable in achieving these objectives.
A third type of information search and navigation system is a tree-based directory. In a tree-based directory, the user generally starts at the root node of the tree and specifies a query by successively selecting refining branches that lead to other nodes in the tree. Shopping.yahoo.com uses a tree-based directory, for example. In a typical implementation, the hard-coded tree is stored in a data structure, and the same or another data structure maps documents to the node or nodes of the tree where they are located. A particular document is typically accessible from only one or, at most, a few, paths through the tree. The collection of navigation states is relatively static—while documents are commonly added to nodes in the directory, the structure of the directory typically remains the same. In a pure tree-based directory, the directory nodes are arranged such that there is a single root node from which all users start, and every other directory node can only be reached via a unique sequence of branches that the user selects from the root node. Such a directory imposes the limitation that the branches of the tree must be navigationally disjoint—even though the way that documents are assigned to the disjoint branches may not be intuitive to users. It is possible to address this rigidity by adding additional links to convert the tree to a directed acyclic graph. Updating the directory structure remains a difficult task, and leaf nodes are especially prone to end up with large numbers of corresponding documents.
In all of these types of search and navigation systems, it may be difficult for a user to revise a query effectively after viewing its result set. In a database query system, users can add or remove terms from the query, but it is generally difficult for users to avoid underspecified queries (i.e. too many results) or overspecified queries (i.e. no results). The same problem arises in free-text search engines. In tree-based directories, the only means for users to revise a query is either to narrow it by selecting a branch or to generalize it by backing up to a previous branch.
Having an effective means of revising queries is useful in part because users often do not know exactly what they are looking for. Even users who do know what they are looking for may not be able to express their criteria precisely. And the state of the art in information retrieval technology cannot guarantee that even a precisely stated query will be interpreted as intended by the user. Indeed, it is unlikely that a perfect means for formation of a query even exists in theory. As a result, it is helpful that the information retrieval process be a dialogue with interactive responses between the user and the information retrieval system. This dialogue model may be more effectively implemented with an effective query revision process.
Some information retrieval systems combine a search engine with a vocabulary of words or phrases used to classify documents. These systems enable a three-step process for information retrieval. In the first step, a user enters a text query into a search form, to which the system responds with a list of matching vocabulary terms. In the second step, the user selects from this list, to which the system responds with a list of documents. Finally, in the third step, the user selects a document.
A problem with such systems is that they typically do not consider the possibility that a user's search query may match a conjunction of two or more vocabulary terms, rather than an individual term. For example, in a system whose vocabulary consists of consumer electronics products and manufacturers, a search for Sony DVD players corresponds to a conjunction of two vocabulary terms: Sony and DVD players. Some systems may address this problem by expanding their vocabularies to include vocabulary terms that incorporate compound concepts (e.g., all valid combinations of manufacturers and products), but such an exhaustive approach is not practical when there are a large number of independent concepts in a system, such as product type, manufacturer, price, condition, etc. Such systems also may fail to return concise, usable search results, partially because the number of compound concepts becomes unmanageable. For example, a search for software in the Yahoo category directory returns 477 results, most of which represent compound concepts (e.g., Health Care>Software).
Another disadvantage of current search and navigation systems is that available techniques for manipulating the content to be displayed have limited flexibility. For example, in some cases, in addition to providing a user with query results, the information provider finds it desirable to supplement those results with additional information or sort or aggregate the results according to criteria that depend on the query or the user. Often, the information provider finds it desirable to guide users in particular directions.—in some cases, this objective may be tied primarily to the interests of users (e.g., viewing popular content); in other cases, this objective may be tied primarily to the interests of the information provider (e.g., displaying sponsored content). Current information search and navigation systems usually follow one of four approaches for providing this function.
One approach is to associate supplemental content with individual materials. For example, an online DVD retailer might associate with a given DVD other DVDs likely to appeal to users who are interested in that given DVD. These other DVDs can then be displayed whenever the given DVD is displayed. These other DVDs might be determined through an editorial process or using an automated procedure like collaborative filtering.
A second approach is to associate content with search key words. For example, some search engines auction keywords to advertisers and then show advertisements to users based on their search key words.
A third approach is to custom-design content for particular queries. For example, in a tree-based directory, each directory node could include supplemental content relevant to that node.
A fourth approach is to provide customized content based on the user's profile or history. For example, some online shopping sites remember what product a user has purchased or viewed and use this information in order to merchandise other products.
These approaches suffer from various limitations. Associating content with individual materials or search key words is a cumbersome approach when the goal is to associate content with general classes of materials, e.g., recent documents or documents with a particular classification. Custom-designing content for particular queries is only practical when the number of such queries is small. Pushing content based on a user profile raises numerous issues, among them accuracy, scalability, and privacy. Various other systems for information retrieval are also available. For example. U.S. Pat. Nos. 5,715,444 and 5,983,219 to Danish et al., both entitled “Method and System for Executing a Guided Parametric Search,” disclose an interface for identifying a single item from a family of items. The interface provides users with a set of lists of features present in the family of items and identifies items that satisfy selected features. Other search and navigation systems include i411's Discovery Engine, Cybrant's Information Engine, Mercado's IntuiFind, and Requisite Technology's BugsEye.