Although the Internet traces back to the late 1960s, the widespread availability and acceptance of personal computing and internetworking have resulted in the explosive growth and unprecedented advances in information sharing technologies. In particular, the Worldwide Web (“Web”) has revolutionized accessibility to untold volumes of information in stored electronic form to a worldwide audience, including written, spoken (audio) and visual (imagery and video) information, both in archived and real-time formats. In short, the Web has provided desktop access to every connected user to a virtually unlimited library of information in almost every language worldwide.
Search engines have evolved in tempo with the increased usage of the Web to enable users to find and retrieve relevant Web content in an efficient and timely manner. As the amount and types of Web content have increased, the sophistication and accuracy of search engines have likewise improved. Generally, search engines strive to provide the highest quality results in response to a search query. However, determining quality is difficult, as the relevance of retrieved Web content is inherently subjective and dependent upon the interests, knowledge and attitudes of the user.
Existing methods used by search engines are based on matching search query terms to terms indexed from Web pages. More advanced methods determine the importance of retrieved Web content using, for example, a hyperlink structure-based analysis, such as described in S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Search Engine,” (1998) and in U.S. Pat. No. 6,285,999, issued Sep. 4, 2001 to Page, the disclosures of which are incorporated by reference.
A typical search query scenario begins with either a natural language question or individual terms, often in the form of keywords, being submitted to a search engine. The search engine executes a search against a data repository describing information characteristics of potentially retrievable Web content and identifies the candidate Web pages. Searches can often return thousands or even millions of results, so most search engines typically rank or score only a subset of the most promising results. The top Web pages are then presented to the user, usually in the form of Web content titles, hyperlinks, and other descriptive information, such as snippets of text taken from the Web pages.
Providing quality search results can be complicated by the literal and implicit scope of the search query itself. A poorly-framed search query could be ambiguous or be too general or specific to yield responsive and high quality search results. For instance, terms within a search query can be ambiguous at a syntactic or semantic level. A syntactic ambiguity can be the result of an inadvertent homonym, which specifies an incorrect word having the same sound and possibly same spelling, but different meaning from the word actually meant. For example, the word “bear” can mean to carry or can refer to an animal or an absence of clothing. A semantic ambiguity can be the result of improper context. For example, the word “jaguar” can refer to an animal, a version of the Macintosh operating system, or a brand of automobile. Similarly, search terms that are too general result in overly broad search results while search terms that are too narrow result in unduly restrictive and non-responsive search results.
Accordingly, there is a need for an approach to providing suggestions for search query refinements that will resolve ambiguities or over generalities or over specificities occurring in properly framed search queries. Preferably, such an approach would provide refined search queries that, when issued, result in search results closely related to the actual topic underlying the intent of the original search query and provide suggestions that reflect conceptual independence and clear meanings as potential search terms.