Applications for servicing client, or end user, queries often operate such that not only are exact matches for a user-specified query identified if they exist, but the closest non-exact matches are also identified and returned to the end user. In this context, the term “query” is not limited to a conventional database query, such as a query in SQL (Structured Query Language). Generally, a query includes any search for information through any search mechanism, such as a conventional search engine or search function. Typically, the user's search request is eventually transformed into a structured database query.
One approach to servicing search requests or queries, in order to identify existing exact matches and non-exact matches, involves: (1) rewriting or reconstructing the user query to include all allowable variations of the original query; (2) retrieving a “hit-list” for the reconstructed concatenated query by submitting the query to a database server; and (3) ordering the hit-list in an order based on the relevance to the original search criteria (sometimes referred to as “relevance ranking”).
For example, if a user initiates a search for information on “cheap pen” on some form of information repository, such as a database or the collection of information that is accessible via the Internet, an “expanded query” is constructed to include both the original query and to include one or more sub-queries that relax the requirements of the original query. An expanded query associated with a search for “cheap pen” might include sub-queries for other allowable versions of “cheap pen,” such as “cheap NEAR pen,” “cheap AND pen,” “$cheap AND $pen” (where “$” represents a grammatical stem operation), “cheap OR pen” and the like. A hit-list is produced based on this set of sub-queries, and the hit-list is then ordered. The ordering may be based on, for example, the specific sub-query that produced a given hit and a speculative relevance to the end user that is requesting the information.
In such an approach, useless work may be performed because all of the sub-queries are executed, whether or not necessary to actually fulfill the user's request and interest. That is, the first sub-query executed may produce a sufficient number of hits or sufficiently relevant results to satisfy the user's interest. Furthermore, if a given sub-query is particularly unselective, it may produce many more hits than are necessary to satisfy the user's interest and unnecessary work is performed by parsing the query statement, querying the information repository, and producing and ordering the results.
Another approach involves: (1) executing sub-queries associated with allowable variations of the original query, in series in descending order of priority; and (2) retrieving hits until enough hits have been located, based on some criteria. This approach involves an entity other than the database server, such as an end user or a search mechanism, issuing a query to the database server based on the original search criteria, receiving results from the query, issuing another query to the database server that expands the original search criteria, receiving results, and continuing this iterative process until the search request has been satisfied according to some quantitative criteria. Query response time and network performance suffer when using this approach due to the potential for multiple complete round-trip communications between the entity and the database server, which unnecessarily load the system. In this context, and throughout the specification, a complete round-trip communication refers to the network communication between a client entity and a database server, as well as the processing performed by the database server, which often includes: (1) parsing the query; (2) constructing a query execution plan; (3) optimizing the query execution; and the like. Secondary client-server communications refer to communications between client and server applications which do not incur the same processing overhead as complete round-trip communications.
Both of the foregoing approaches are inefficient in terms of response time, processing, and network loading. Furthermore, these approaches are cumbersome for developers of search applications and mechanisms because they require such applications and mechanisms to speculatively relax the search requirements and process results with respect to relevance ranking. Furthermore, they provide limited capabilities, if any, for end users to affect the priority of search term variations in the context of relaxation of the original search criteria and, consequently, the relevance of associated results.
Based on the foregoing, it is clearly desirable to provide an improved mechanism for servicing information searches. There is a more specific need to provide more control to an end user that is requesting a search for particular information to provide more efficient and more relevant performance.