The quality of a query-based full text search can be defined in terms of the relevancy level of the top search results (e.g., the top 10-20 results). For the purposes of this patent application, the function of a search engine is to locate documents that contain one or more query terms supplied by a user, and to assign a highest score or rank to the document or documents that meet certain statistical or other criteria as applied to the query terms.
This particular technique is adequate for many applications so long as the query contains terms that can be used to unambiguously identify the subject of the search, otherwise the top search results may contain links to irrelevant documents.
As an example, reference can be made to FIG. 1 for showing the schema of a conventional search service. A search space 1 contains multiple document collections that may belong to different subject domains 1A (e.g., Domains1-3). A search engine 2 operates on a full text index 3 of the search space 1, created by indexing words in all document collections. A query processor 4 passes a given user query 5 to the search engine 2. The search engine 2 finds those documents containing the query terms, using the full text index 3 of the search space 1. The scores that are assigned to the found documents depend on certain statistical criteria as applied to the query terms. A results processor 6 renders the search results ordered by their score for presentation to the user.
Reference with regard to search engines can be had to the following exemplary publications: Michael W. Berry, Murray Browne, “Understanding Search Engines: Mathematical Modeling and Text Retrieval (Software, Environments, Tools)”, Society for Industrial & Applied Mathematics, June 1999, ISBN: 0898714370; and Berthier Ribeiro-Neto, Ricardo Baeza-Yates, “Modern Information Retrieval (ACM Press Series)”, Addison-Wesley Pub Co, May 1999, ISBN: 20139829X.
A problem results when some number of the returned top search results belong to different subject domains 1A in the search space 1, independent of the actual search subject. This is an undesirable situation, as it limits the usefulness of the returned search results.
This problem has been previously addressed by J. Cooper and R. Byrd in: “OBIWAN—A Visual Interface for Prompted Query Refinement”, HICSS (2), 1998, pp. 277-285. These authors propose various extensions to a traditional search service in order to avoid the problem of ambiguous search results. One extension is to provide additional sophisticated indices to document collections, based upon domain-specific vocabularies that contain multi-word names and terms. Another extension provides Context Thesauruses that specify relations between vocabulary items. The use of Lexical Networks is also proposed, where vocabulary items are network nodes and relations are links between the nodes. These authors further propose to create a mechanism that allows a look-up of vocabulary items related to the original query terms, and an ability to suggest additional terms that the user may employ to better focus the query. Also proposed is a Graphical User Interface (GUI) that allows the user to select one or more vocabulary items suggested by the Context Thesaurus in response to the user query. Selected items are then added to the query terms to focus the query. The user in this case needs to repeat the query refinement process for each new set of selected items until the user is satisfied with the results.
As can be appreciated, this approach adds complexity and cost to the search engine implementation, and furthermore requires the active participation of the user in the query refinement process, a requirement that some users, in particular unsophisticated users, may find burdensome.