As the use of search engines and other applications having text based query input becomes more ubiquitous, so have the chances for the occurrence of a number and variety of spelling errors in users' input. Queries with wrong spellings in the query entry mechanism tend to result in no matching results, or tend at least to result in results ill reflective of the users' intention due to the error(s) input to the query entry mechanism.
The graph of FIG. 1 depicts an approximate number of unmatched queries detected in an exemplary user input data set. Examples of misspellings include: kareoke, abrahm lincoln, sophmore and pregnacy. As one might predict, the number of misspellings increases as the query log is delved into more deeply, towards less frequent queries.
An existing problem is that it is not easy for users to realize that they have made a mistake. A user may become dissatisfied as a result, or users who spell poorly may not be privileged with the best results. Users would have a better experience if a search engine or other service could somehow correct their mistakes, and return results corresponding to what the users meant or intended to ask.
There are other related issues, such as ease of extensibility. Given the dynamic nature of the Web, and the fact that users' queries reflect the changing world, it would be advantageous to update the standard against which mistakes are tested regularly and in real-time.
The expanding reach of the Web, and the diverse background and spelling abilities of users, coupled with esoteric names and phrases for which people now search, makes it easy to predict with a high probability that many new unmatched queries will be input frequently. Many of these queries are misspellings of known words or phrases. Some are phrases with intervening blanks deleted, i.e., “squished words.” Some are URL fragments, and so on. For the convenience of the present discussion, any query entered having at least one error may be considered a misspelling. Based on preliminary empirical query log data, according to one study, it is estimated that between 3% and 5% of all queries are misspelled. Queries with misspellings may result in no matching results, or irrelevant results. Worse yet, the user may have no idea that the error occurred while inputting the query. At a minimum, the input of a query having at least one error in the query wastes computing time and resources, as well as the user's time from entry of the misspelled query to the discovery of the misspelling.
There are several spelling correction (‘speller’) programs for correction of text in word-processors. However, the requirements for runtime spelling correction are different than for word-processors. In particular, word processors do not demand a real-time up-to-date dictionary to function properly; it is the very “standard” nature of a language not to change that would make such a dynamic dictionary not as necessary for a word processor, and the user may nonetheless enter words that the user truly wishes for the word processing dictionary to accept as correct. For a word processing application, the user wishes to depend upon the word processor to indicate when, e.g., an English word is used incorrectly. Thus, to expand the Oxford English dictionary to include the phrase “STAIND®,” a popular modern rock band, might otherwise undermine the ability of the dictionary to differentiate for the correct English word “stained.” Any time an update truly needs to be made to the static dictionary used with a word processor, an update is delivered to the static store location of the dictionary, or as mentioned, the user may manually add words to the dictionary. Since such an update does not need to be frequently made and because the process does not attempt to include culturally based, fleeting or most Web-specific terminology, this process is not unduly burdensome and works for word processing applications.
Thus, the dictionary store for a word processor is not dynamically updated and word processors store a static dictionary in a commonly accessible location, such as on the client computer. As one can see, the operation of automatic spelling correction in connection with a word processing application is different than for corresponding runtime applications and services, such as Web searching. In short, to obtain a rich user experience, more than a primarily static, co-located standard written language dictionary is required for runtime services.
While spelling correction has been employed for word processors and other static contexts, there are many contexts that apply runtime techniques against dynamic data. Thus, for instance, the application of spelling correction to a word processor is different from the application of spelling correction to search engine queries, where proper names of people, companies (“siti nurhaliza”, “junglee”), squished words such as “marketingcompanies,” foreign words and phrases (“Concierto de Aranjuez”) are common. Web queries are often devoid of any consistent case information, making it hard to detect proper names. Similarly, squished words are hard to detect without a lexicon. To further illustrate, a general dictionary used for word processing contains about 200,000 words, whereas a Web English lexicon has about 500,000 words, all of which are dynamically maintained.
For instance, pop star Britney Spears may have many different associated misspellings, and pop culture is ever changing, and so the Web English lexicon is a dynamically updated data store to reflect the dynamic nature of the data stored therein. In one informal test, it was observed that there are 88 known spelling variants of Britney Spears, which cover about 95% of the errors people make and consequently it would be advantageous to provide a runtime spell check mechanism that can adapt to names from popular culture, and other popular Web-specific search items. Thus, there is a need for a more comprehensive implementation of a dynamic lexicon for spellchecking of Web queries and similar material, or other queries for other services.
In this regard, it would be desirable to provide a runtime mechanism for analyzing spelling and alerting a user that a misspelling has occurred. Some systems today, such as one may find on a Web site such as GOOGLE.COM® or ALTAVISTA.COM®, include such a mechanism, although there are the drawbacks associated with these present approaches. Other such systems, such as MERRIAM WEBSTER®'s on-line dictionary, make spelling suggestions to a user, although there are drawbacks associated with these techniques as well.
With the systems of GOOGLE.COM®, ALTAVISTA.COM®, when a user wishes to perform a search for Web pages having a word X therein, the user enters the desired search terms in a text box. In the case of a misspelling, for instance if the user misspells the word X and spells it X1 instead, the search engine performs a search on X1 and returns the list of results having X1 therein, but also the search engine analyzes the input, in this case X1, and based upon a certain confidence calculation, determines that X may have been the appropriate input. If correct in its determination, however, the user must still assent to the additional search operation of the correct word X by making further operation, e.g., by re-inputting X for a further search. As mentioned, some runtime services do provide some level of spelling analysis. Thus, GOOGLE®, ALTAVISTA.COM®, etc. provide spelling correction at different levels, but they rely on users choosing a particular spelling alternative, or force the user to type in a new query based on spelling suggestions or tips provided. These services, however, do not automatically choose a high-confidence correction, and then utilize the high-confidence correction to search as the search query, while allowing the user to ignore the correction and search on the original query if that was indeed the intent.
With the system of MERRIAM WEBSTER®'s on-line dictionary, alternate spellings are proposed for misspelled words, but a user must select a correctly spelled word from a list of words within a threshold distance metric of the misspelled word in order to proceed. While this may be adequate for some purposes, e.g., perhaps when the user doe not even know how to correctly spell the word, in other contexts, it is clear not only that the user has made an error, but also what the error is. In those instances, a system that merely offers suggestions will make no such intelligent determination, and require further effort from the user.
There is thus a need for a mechanism that may be used in connection with a query input operation to decipher when a misspelling or an error has occurred. There is a further need for a mechanism that may be used in connection with a text-based query input operation to decipher when a misspelling or an error has occurred with a high confidence. There is still further a need for a mechanism that performs the operation associated with the query input as if the misspelling or error had not occurred. There is still even further a need for a mechanism that determines whether a misspelling or an error has occurred in a query input operation vis-a-vis a plurality of dynamic data stores or sources, such as a dynamically updated Web-oriented dictionary combined with a static dictionary store.