Although the Internet traces back to the late 1960s, the widespread availability and acceptance of personal computing and internetworking have resulted in the explosive growth and unprecedented advances in information sharing technologies. In particular, the Worldwide Web (“Web”) has revolutionized accessibility to untold volumes of information in stored electronic form to a worldwide audience, including written, spoken (audio) and visual (imagery and video) information, both in archived and real-time formats. In short, the Web has provided desktop access to every connected user to a virtually unlimited library of information in almost every language worldwide.
Information exchange on the Web operates under a client-server model. Individual clients execute Web content retrieval and presentation applications, typically in the form of Web browsers. The Web browsers send request messages for Web content to centralized Web servers, which function as data storage and retrieval repositories. The Web servers parse the request messages and return the requested Web content in response messages.
Search engines have evolved in tempo with the increased usage of the Web to enable users to find and retrieve relevant Web content in an efficient and timely manner. As the amount and types of Web content has increased, the sophistication and accuracy of search engines has likewise improved. Generally, search engines strive to provide the highest quality results in response to a search query. However, determining quality is difficult, as the relevance of retrieved Web content is inherently subjective and dependent upon the interests, knowledge and attitudes of the user.
Existing methods used by search engines are based on matching search query terms to terms indexed from Web pages. More advanced methods determine the importance of retrieved Web content using, for example, a hyperlink structure-based analysis, such as described in S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Search Engine,” (1998) and in U.S. Pat. No. 6,285,999, issued Sep. 4, 2001 to Page, the disclosures of which are incorporated by reference.
A typical search query scenario begins with either a natural language question or individual keywords submitted to a search engine. The search engine executes a search against a data repository describing information characteristics of potentially retrievable Web content and identifies the candidate search results. Searches can often return thousands or even millions of results, so most search engines typically rank or score only a subset of the most promising results. Targeted search results can also be introduced, such as advertising or topical information content. The top search results are then presented to the user, usually in the form of Web content titles, hyperlinks, and other descriptive information, such as snippets of text taken from the search results.
Search engines are generally available to users located worldwide. Thus, part of providing high-quality search results is being able to provide those search results in languages acceptable to the requesting user. Acceptable languages include languages specified by the user, as well as other acceptable languages. For instance, a French-preferring user might also accept search results in English. Acceptable languages can also include related languages and dialects. For example, Portuguese search results might be acceptable to a user who generally prefers Spanish. Finally, acceptable languages can include dead languages, such as classical Greek or Olde English, or psuedo-languages, such as Klingon. Dead and psuedo-languages are typically not supported by search engines, but may nevertheless reflect the academic, historic, or personal interests of the requesting user.
Currently, the Hypertext Transfer Protocol (HTTP) is used by most Web browser, Web server, and related Web applications, to transact Web information exchange. HTTP is a session-less protocol and no state identifying user preferences, including language, is typically maintained. The only information available to indicate the languages acceptable to a user are either preferences maintained independently of each HTTP transaction or within the search query itself. First, user-provided preferences are specified either at the Web client or Web server. Client-side preferences, such as languages accepted by a Web browser, are communicated through request message headers. Server-side preferences are specified via search engine options and are maintained independent of each HTTP transaction using cookies, which must be retrieved from the Web client prior to executing a search, or via a log-in procedure.
Although effective at specifying accepted languages, users seldom explicitly set language preferences in practice. As well, language preferences are often too restrictive, presenting an all-or-nothing paradigm. The language preferences function as a search result filter, providing only those search results in the preferred language and disallowing those search results in related or alternate languages.
Similarly, default settings for specifying accepted languages, either client- or server-side, can further complicate providing suitable search results. Often, default settings can be incorrect. For instance, English could be specified as a default language preference by virtue of a Web browser option, but may be unsuitable for presenting search results to a non-English proficient user.
Second, query-based preferences are derived from the terms in a given search query. Search query terms, however, are not reliable for determining language preferences for several reasons. First, proper nouns, such as the name of a person, place or thing, are often language-independent and are a poor indicator of the language desired for search result presentation. For instance, a search engine will be unable to determine accepted languages for a search query consisting of the proper name “Elvis.” Second, search queries, particularly when specified in key words, often consist of only a few individual words, which generally fail to provide sufficient context from which to determine a language preference. Like proper names, individual words can be language-independent or language-misleading. For instance, a search engine could be misled by a search query consisting of the words “Waldorf Astoria.”
Accordingly, there is a need to provide an approach to dynamically determining language preferences for the presentation of search results to a user. Preferably, such an approach would accommodate both preferred and lesser preferred languages, which are acceptable to the user, and include both related and alternate languages within the language preferences.
There is a further need for an approach to presenting search results in an ordered fashion in accordance with user preferred languages. Preferably, such an approach would order or score search results to favor those search results in preferred languages while accommodating those search results in other languages.