1. Field of the Invention
The present invention relates in general to search query optimization, and more particularly, to use of prior search data to recommend alternate search queries or keywords or key word phrases to modify or to replace a search query.
2. Description of the Related Art
Today's computer networks allow interconnection of large numbers of information processing systems, storage devices and file servers so that databases can be shared across systems. As a result, users now have immediate access to enormous amounts of information. The internet is a prime example of such a computer network. In order to take advantage of the vast amount of information made available by technological advances, users must be able to identify, locate and retrieve desired information in a timely manner. To do this, information retrieval systems have been developed that allow users to quickly identify, locate and retrieve the best and most relevant information associated with a user request.
In an internet environment, an internet search engine serves as the information retrieval system. A typical internet search engine comprises a program that searches internet accessible documents, such as web pages, for specified keywords or keyword phrases and returns a list of the documents that include the one or more of the keywords or keyword phrases. A search engine generally works by sending out a spider to fetch as many documents as possible. The spider ordinarily includes a program, called an indexer that reads these documents and creates an index based on the words contained in each document. Each search engine uses different proprietary algorithms to create indices such that, ideally, only meaningful search results are returned in response to a user's submission of a keyword-based query. For example, in response to a user's keyword-based search query, a search engine might provide a list of URLs for web pages that contain one or more of the keywords in a user's search query. A Universal Resource Locator (URL) is an address which can uniquely specify a resource accessible via the internet (e.g. http—for web pages, ftp—for file transfers, mailto—for email, etc.).
Selection of optimal keywords or keyword phrases is a central challenge of keyword-based searching for content available on a computer network such as the internet. A keyword is a word used in a search engine query. A keyword phrase is a phrase used in a search engine query. A search engine query comprises a term, or more often, a group of terms or a phrase used by a search engine to find web pages or sites with content and information identified through such query.
For example, computer network users frequently search for information on topics about which they know very little. As a consequence, they often do not know the optimal keywords to use to search for content on a particular topic of interest. A poorly crafted search query may be too incomplete or inaccurate to efficiently locate the information the user really wants. For instance, a search engine may identify hundreds or even thousands of items in response to a broadly worded search query. If a user is performing a search to locate a particular item or set of items, then a search result with hundreds or thousands of items is far from an optimal.
There are numerous prior solutions to the search query formulation problem. For example a prior system has been disclosed to automatically expand a user-provided query string to include terms that do not appear in the query, but which may correspond to or be associated with user-provided query terms. A shortcoming of such automatic query expansion system is that it allows too little opportunity for a user to participate in the query development process.
A prior alternative system that allows user-participation in a search query expansion process takes a user-provided query and uses it to locate a list of matched phrases from a corpus of documents. A user can elect to take words from returned phrases that are not included in the original query to refine the query. This process can be repeated to retrieve documents that are increasingly focused on a desired topic.
A prior alternative system that allows user-participation in a search query refinement process presents takes a query term from a user-provided search query and uses query term correlation data to identify additional query terms that are deemed to be related to the query term. The additional query terms are presented to the user for selection to allow the user to refine the search query. The query term correlation data is developed over time from user search queries and reflects frequencies with which query terms appear together within the same search query.
Unfortunately, emphasis upon identification of related query terms risks missing keyword phrases (comprising more than one term) that may improve upon a search query. For instance, a search query, “new york restaurants” might be improved upon by substituting the search query, “new york city restaurants”. Both of these search queries contain the keyword “restaurants”, but the latter keyword phrase may improve the search query if the search is directed to finding restaurants in the City of New York.
While earlier query expansion and query completion systems of the general type described above generally have been successful, there have been shortcomings with their use. Specifically, these systems generally infer a relationship between query terms based upon their shared presence or the frequency of their shared presence within a single document or web page or search query. These systems tend to encourage a user to pursue the development or refinement of an initial search query.
As explained above, however, computer network users often search on topics about which they know very little. As a result, an initial search query may be far from optimal and require significant development. Thus, an initial search query may be so poorly formulated to be readily optimized.
Significantly, mere search query refinement is unlikely to correlate alternative search keywords that do not typically occur together, even if it is likely that others interested in the same topic would have searched for it using one or the other of such alternative terms. For example, a person seeking a place to eat while out of town on a trip might enter a search query containing the keyword, “restaurant”. That person might not think to enter the keyword, “maps”, even though that may be the best way to find a restaurant, since a map's web site might link to local restaurant home pages. The prior search query refinement systems described above are unlikely to make such a restaurant-to-maps keyword correlation.
Thus, there exists a need for an improved system and method for optimal search query selection. The present invention meets this need.