The invention relates generally to searching for relevant data entities based on a search query, specifically in the context of ambiguous or under-specified queries. More particularly, the invention relates to helping users to refine their search queries by identifying search concepts related to the user's search query, providing the means for the user to use these concepts to refine their query and submit an enhanced query based on such concepts, and thus access information more specific to their needs.
One of the greatest strengths and greatest weaknesses of the Internet is the vast amount of information that is distributed over all the computers connected on the Internet. This is one of the Internet's greatest strengths in that individuals have access to great amounts of information on almost any topic imaginable. However, this is also one of the Internet's greatest weaknesses in that, because of the vast amount of information, it is difficult to know what information on a desired topic is available, and where to go to find the information.
Search engine technology attempts to overcome this weakness of the Internet by providing an indexed access to a collection of web pages that a user can search. The user typically enters a search query. The search engine then finds the web pages that contain or otherwise relate to the search query, and this list of web pages is presented to the user. There are a number of different ways that search engines determine which web pages are relevant to a given search query, such that those web pages are presented to the user.
First, one type of search engine constantly scans the Internet, in a process referred referred to as spidering. This type of search engine has been popularized by ALTA VISTA and GOOGLE, among others. Each page of a web site that is visited by the spider is cataloged for the words that appear in the web site. This information is indexed and stored in a search engine database. When a user enters a search query, the search engine matches the query against the search engine database to find the web pages that are most relevant to the query by some measure. For example, the search engine may determine the number of times the query appears in a given web page to determine its relevance, or the search engine may determine the number of other web pages that link to the given web page in which the query appears to determine its relevance.
This type of search engine is disadvantageous in that many search queries contain words that are related to more than what the user is searching. For example, the user may be looking for web pages regarding the golfer Tiger Woods. However, if the user just enters the word Tiger as the search query, the search engine is likely to return many web pages related to the animal tiger, as well as to the golfer Tiger Woods. Furthermore, if the user enters the words Tiger Woods, the search engine may also return web pages that include the words tiger and woods, but which do not necessarily relate to the golfer Tiger Woods.
Another type of search engine compares a search query to web pages cataloged in a topical directory. This type of search engine has been popularized by YAHOO! and LOOKSMART. A team of people assigns web sites to one or more different categories within the directory. When a user enters a search query, the search engine matches the query against the directory of web pages, and returns both the categories and the individual web pages that are relevant to the query. For example, in response to a Tiger Woods query, the search engine may return the category Sports:Golfers:Tiger Woods and the category Animals:Tigers, as well as web pages that contain both the words tiger and woods.
This type of search engine also has its disadvantages. If the user enters a query too broad to find adequately specific and targeted results, it is often difficult to guess a query that would easily and accurately narrow the query to the desired area.
Other failings are common to all of these and other types of search engines. Most are unforgiving as to misspelled words, or abbreviated variants for desired topics. For example, if the user enters in tigr woods instead of Tiger Woods, search engines are likely not to return many relevant pages regarding the golfer. Search engines may also provide results that are considered inappropriate by many users, or, in the case of children, their parents. For example, a user may enter in as a query the name of his or her favorite singer. Besides web sites geared towards providing information about the singer, search engines may also return X-rated sites that claim to provide inappropriate pictures of the singer.
Another failing of existing types of search engines is that they assume a level of searching experience or sophistication on the part of their users that may not exist. In other words, the quality of search results they return frequently corresponds to how good the search query is that the user entered. Users who are less competent in formulating search queries are therefore likely to receive poorer search results from search engines as compared to users who are more competent in formulating queries. For example, less knowledgeable users may enter queries that are overly broad, or alternatively, overly specific. Overly broad queries are likely to generate search results that contain a number of irrelevant web pages, whereas overly specific queries are likely to generate search results that may not include a number of relevant web pages.
For these and other reasons, therefore, there is a need for the present invention.