1. Field of Invention
This present invention relates to query processing, and more specifically relates to techniques for facilitating the process of refining search queries.
2. Description of Related Art
With the increased growth of the Internet and the World Wide Web, it is common for on-line users to utilize search engines to search for desired information. Many web sites permit users to perform searches to identify a small number of relevant items among a much larger domain of items. As an example, several web index sites permit users to search for particular web sties among known web sites. Similarly, many on-line merchants permit users to search for particular products among all of the products that can be purchased from the merchant.
In order to perform a search, a user submits a search query containing one or more query terms. The search query may also explicitly or implicitly identify a record field or segment to be searched, such as title, author, or subject classification of the item. For example, a user of an on-line bookstore may submit a title-field-restricted search query containing terms that the user believes appear within the title of a book. A query server program of the search engine processes the search query to identify any items that match the terms of the search query. The set of items identified by the query server program is referred to as a xe2x80x9cquery result.xe2x80x9d In the on-line bookstore example, the query result is a set of books that satisfy the query, and in the web index site example, the query result is a set of web sites or web pages. In some implementations the query result may include items that contain only a subset of the terms of the search query. In web-based implementations, the query result is typically presented to the user as a hypertextual listing of the located items.
If the scope of the search is large, the query result may contain hundreds, thousands, or even millions of items. If a user is performing the search in order to find a single item or a small set of items, conventional approaches to ordering the items within the query result often fail to place the sought item or items near the top of the query result. This deficiency often requires the user to read through many items in the query result before reaching the sought item.
Some search engines suggest related query terms to the user as part of the xe2x80x9csearch refinementxe2x80x9d process. Through the search engine""s user interface, the user can select one or more of these related terms to add to the query. The goal of this process is to produce a refined search query that more narrowly specifies the user""s intended request. The related query terms can be generated by the search engine using the contents of the query result, such as by identifying the most frequently used terms within the located documents or other items.
The related query terms can also be generated by using query data that is based on historical query submissions to the search engine. A preferred scheme for generating and providing users with related query terms based on query data is disclosed in U.S. Appl. No. 09/145,360, filed Sep. 1, 1998, titled SYSTEM AND METHOD FOR REFINING SEARCH QUERIES which is incorporated herein by reference. In this scheme, relatedness between terms is determined based on the frequency of co-occurrence of terms within the same query. Although this scheme represents a significant improvement over prior methods, in certain circumstances, the related query terms may not accurately reflect historical query submissions. Thus, the related query terms do not always assist the user with refining the search query.
The present invention addresses this and other concerns by using information about historical query submissions to a search engine to suggest previously-submitted, related search phrases to users. The related search phrases are preferably suggested based on a most recent set of query submissions data (e.g., the last two weeks of submissions), and thus strongly reflect the current searching patterns or interests of users. The invention is preferably implemented within a search engine used to locate items that are available for electronic purchase, but may be implemented within other types of search engines.
In accordance with one aspect of the invention, a table generation component uses information about prior query submissions to generate a table or other data structure that links key terms to previously-submitted search phrases containing such key terms. These xe2x80x9crelated search phrasesxe2x80x9d are preferably selected for inclusion in the table using a scoring algorithm which scores the search phrases based on at least one of the following: (i) frequency of search phrase submission, (ii) number of matches found in response to search phrase submission; and (iii) actions performed by users with respect to search results of search phrase submission. In one embodiment, the scores are based solely on frequency of search phrase submission, not counting search phrases that produced a NULL query result. For each key term, the most highly scored N (e.g., 50) search phrases containing that key term are stored in the data structure for subsequent look up.
In one embodiment, each table entry (keyword and related search phrase list) is specific to a particular search field of the search engine. For example, in the context of a search engine used to locate book titles, the key term xe2x80x9ccomputerxe2x80x9d may have one list of related search phrases generated from submissions within a xe2x80x9csubjectxe2x80x9d field, and another related search phrases list generated from submissions within a xe2x80x9ctitlexe2x80x9d field. In other embodiments, the invention may be implemented without regard to search field identity. In accordance with another aspect of the invention, when a user submits a search query, a query processing component uses the table to look up one or more related search phrases to suggest to the user as alternative queries. For single-term queries, this is preferably accomplished by looking up and displaying the most highly-scored related search phrases associated with the single term and its search field. For multiple term queries, the related search phrase lists associated with the multiple query terms may be appropriately combined, and the most highly scored search phrases then suggested from the combined list. In either case, each suggested search phrase is preferably presented on a search results screen as a respective link that can be selected by the user to submit the phrase as a substitute query.