Interaction with automated programs, systems, and services, has become a routine part of most people's lives—especially with the advent of the Internet. Web surfing or browsing for instance may even be the “new” national pastime for a certain segment of the population. In accordance with such systems, applications such as word processing have helped many become more efficient in their respective jobs or with their personal lives such as typing a letter or e-mail to a friend. Many automated features have been added to these applications such as tools for formatting documents in substantially any desired font, color, shape, or form. One tool that has been appreciated and well received by many users is a spell checking application that is either invoked by a user from the word processor to check all or portions of a respective document and/or invoked to run in the background to check spelling as users are typing. Generally, in order to perform accurate spell checking, a dictionary of “valid strings” may be employed by the spell checking application. If a spell checker encounters a string not in the dictionary, it may hypothesize that the string is a spelling error and attempt to find the “closest” string in the dictionary for the misspelled string. Most spell checkers provide a list of possible matches to the user, whereby if the match is on the list, the user can select the word having the corrected spelling from the list. Other spell checking features may perform automatic corrections—if so configured by the user.
Spell checking for word processing, however, presents only a partial view of potential areas that may be applicable to assist users when entering information into a file or document. For example, with all the potential web sites and services available, users often navigate between sites by explicitly typing in all or portions of the site name or by performing searches on words or phrases that appear in the title and the body of a web page. As many have come to find out, if the site information or the search query is entered incorrectly, the cost in time to re-navigate can become quite high. Language processors employed in search engines or other applications often process user queries and may attempt to distinguish actual user commands from incorrectly entered information. As can be appreciated, however, the type of information that may be entered for a query to a search engine may be quite different in structure or form than typically employed in a word processing application. Thus, tools that check words on a somewhat individual and isolated basis in a word processor application may have little or no utility when applied to information generated from general query data.
Browser or other search queries for information present a unique problem for spell checking applications, since the queries often consist of words that may not be found in a standard spell-checking dictionary, such as artist, product, or company names. Another problem is that a word in a query may have been entered incorrectly, but not be spelled incorrectly (for example, “and processors” instead of “amd processors”). Thus, the manner in which people enter text into a type-in line, for example, such as an input box to a search engine, is often very different than typing for word processing. Both what is entered and the types of errors people make with respect to query input are also quite different in nature. Furthermore, web data and search queries are very dynamic in nature, containing a large number of proper nouns; new products, people, institutions, locations, and events become popular every day. As such, a standard dictionary, while suitable for spell checking in the context of word processing, may not be appropriate for type-in-line and search-query spell checking.
A dictionary (i.e., lexicon) is an important component of any spell checker since the information contained therein provides the foundation to determine incorrect spellings. However, for many applications where spell checking is desired (e.g., text input provided to input boxes), a standard dictionary is not optimal for the problem. For instance, to spell check text input to the input box of a search engine, a dictionary should include strings such as “hanging chad” and “Apolo Anton Ohno” in order to check more recent events or information that may be of interest. As can be appreciated, these and a plurality of other type strings would not appear in a standard dictionary. One possible approach is to utilize substring matching techniques on a log of what users are typing into a particular location, such as a search engine or language processor. Unfortunately, a problem with this approach is that the query logs will generally also contain a large number of input errors and return substring matches that are not relevant to a user's desired search.
Additionally, the dictionary utilized for the spell checking and the context of the search are always changing. These dynamic behaviors cannot be accounted for utilizing traditional dictionary and search query processing. For example, if there is currently a popular band called Limp Bizkit, a search for “bizkit pictures” is likely to refer to this band and not a misspelling of “biscuit.” If suddenly the band becomes unpopular, and there is a top-selling book on pictures of biscuits, “bizkit pictures” is then more likely a misspelling of “biscuit pictures.” Likewise, given a current state of politics, “govenor anld” probably refers to “governor arnold” if he is currently a popular California governor. Thus, the context of the search query impacts the spell checking significantly.