Many Internet sites and computer systems provide search engines to help users find documents and other items of interest. Google and Yahoo are examples, as are Lexis and Westlaw. Many business entities commonly provide search engines for assisting customers and employees in locating items of interest In general, to use a search engine, a user creates and submits a query containing one or more search terms. The search engine processes the query and returns a list of items of interest and/or descriptions of and pointers to items of interest. An information retrieval system seeks to match the terms contained in a user query, which is a request for information, to individual documents or information modules known to the system.
Information retrieval based on indexing of the documents in a document collection is also known in the prior art. Typically an index is created to record documents that correspond to specific words or terms of potential queries. When a user creates a query, the documents that contain or otherwise correspond to some or all of the terms of the query are identified.
Information retrieval is an imperfect art. A query often produces unsatisfactory retrieval results by either returning insufficient documents that are relevant, returning documents that are not relevant, or some undesired combination of both. A standard means to assess information retrieval effectiveness is to measure “recall” and “precision”, corresponding respectively to the proportion of desired documents returned and the proportion of returned documents that are correct.
A user faces a number of problems in formulating a query to effectively retrieve the desired documents. First, the user often may not know a subject area well enough to know specifically what s/he is looking for. Secondly, the user may only have general concepts in mind with no means to map these effectively to terms that may actually occur in the documents of interest. Third, even if the user has some specific terms in mind, they may not be sufficiently effective for query construction since the user often does not know the terms, phraseology, and usage patterns that are current in the subject area of interest. Fourth, the user will not likely be aware which terms that may seem to be associated with the desired information goal are actually of more general use in documents that should be excluded. Finally, the user can not normally know the various ways in which the information retrieval system may weight, bias, favor, or exclude terms during and for reasons of internal processing or selection of returned documents.
To improve recall and/or precision the user often needs to modify the original query. However, the process of revision has the same limitations as the formulation of the original query, including that the user needs to estimate, with no reliable basis, which terms might correspond with sufficient strength or frequency, and relative to a ‘black box’ information retrieval system, to all and only the documents s/he desires to retrieve.
The prior art teaches several strategies to help the user construct queries. Some systems enhance the query by handling grammatical or other variants of words so the user need not bother with plurality, tense or other variations of no relevance to the query. Other systems may expand the query to include entries from a thesaurus of synonyms for terms in the original query. Additionally, systems may examine the documents returned from the original query in order to discover terms that can be used to refine the query, either by the user, or automatically. Finally, systems may attempt to narrow the scope of the query to one or more restricted narrow topic areas so that only items corresponding to the topic narrowing may be displayed.
A number of prior systems attempt to facilitate information retrieval by determining a set of topics that may correspond to the user query. These systems provide, along with the search results, clickable links, which, if selected, lead to pages containing only items within the topic selected. The topics in these systems are not used to help the user construct queries. The use of topics is optional and the user is not directed into a refined search within a particular topic area. The topics are not associated with vocabularies of terms for use in query. These systems have not collected and appropriately analyzed a set of documents for a set of subject areas in order to make term usage and phraseology per subject area available to the user. They do not mandate restriction to a subject area as part of the query process.
Other prior systems process the vocabulary of the documents retrieved for a query to determine associations of retrieved document terms with query terms. This enables such systems to offer the user a very limited selection of terms for the user to consider adding to the query.
One shortcoming is that these systems have not collected and appropriately analyzed a set of documents for a set of subject areas in order to make term usage and phraseology per subject area available to the user. Therefore what is needed is a system that collects and appropriately analyzes a set of documents for a set of subject areas to generate term usage and phraseology per subject that is available to the user.
Another shortcoming of these systems is that they do not mandate restriction to a subject area as part of the query process. Consequently, they cannot offer the same systematic broad coverage of terms and phrases for potential enhancement of the query. Therefore what is needed is a system that mandates restriction to a subject area as part of the query process that results in a systematic broad coverage of terms and phrases for potential query enhancement.
In addition, these prior systems may offer advertisements or other ancillary material to the user and in so doing analyze either the query the user has submitted or the content of the text of the page of material the user has navigated to. This analysis enables the information system to present advertisements or ancillary materials that are relevant to the query or the content of the page. Since advertisers pay for display of their advertisements this facility is part and parcel of the commercial business model for the information system. Pricing for advertisement placement in these systems may depend on or be constrained by the degree of said relevancy.
Therefore there is a need for an improved process for enhancing queries for information retrieval with respect to accuracy and relevancy with respect to the returned results.
There is also a need for an improved process for enhancing queries for information retrieval that collects and appropriately analyzes a set of corpuses of documents in order to make term usage and phraseology available to the user.
What is also needed is an improved process for enhancing queries for information retrieval that automatically identify primary and alternate terms for a user query and ranks them by relevance to the user's query so that the user may consider selection from a range of relevant alternative terms.
What is also needed is an improved process for enhancing queries for information retrieval that forces the user passively or actively to accept a default or choose from a list a term usage subject area appropriate to the query, to enable relevant additional query term lists confined within its scope to be subsequently used.
What is also needed is an improved process for enhancing queries for information retrieval to automatically resource the discussions and expositions of a community of interlocutors and experts for the purpose of presenting the user with its standard terminology and phraseology.
What is also needed is an improved process for enhancing queries for information retrieval that allows the user to construct and learn to construct more effective queries by presenting lists of terms that demonstrate the implications or connotations of particular search terms in the user's original query.
What is also needed is an improved process for enhancing queries for information retrieval that makes unnecessary the knowledge of a specialized query language by presenting concrete terms to be added to the query.
What is also needed is an improved process for enhancing queries for information retrieval that offers concrete terminology options to the user, more effective than artificial abstractions away from the users own words and interests such as artificial tags or computer-oriented tags.