Interaction with automated programs, systems, and services, has become a routine part of most people's lives—especially with the advent of the Internet. Web surfing or browsing for instance may even be the “new” national pastime for a certain segment of the population. In accordance with such systems, applications such as word processing have helped many become more efficient in their respective jobs or with their personal lives such as typing a letter or e-mail to a friend. Many automated features have been added to these applications such as tools for formatting documents in substantially any desired font, color, shape, or form. One tool that has been appreciated and well received by many users is a spell checking application that is either invoked by a user from the word processor to check all or portions of a respective document and/or invoked to run in the background to check spelling as users are typing. Generally, in order to perform accurate spell checking, a dictionary of “valid strings” may be employed by the spell checking application. If a spell checker encounters a string not in the dictionary, it may hypothesize that the string is a spelling error and attempt to find the “closest” string in the dictionary for the misspelled string. Most spell checkers provide a list of possible matches to the user, whereby if the match is on the list, the user can select the word having the corrected spelling from the list. Other spell checking features may perform automatic corrections—if so configured by the user.
Spell checking for word processing, however, presents only a partial view of potential areas that may be applicable to assist users when entering information into a file or document. For example, with all the potential web sites and services available, users often navigate between sites by explicitly typing in all or portions of the site name or by performing searches on words or phrases that appear in the title and the body of a web page. As many have come to find out, if the site information or the search query is entered incorrectly, the cost in time to re-navigate can become quite high. Language processors employed in search engines or other applications often process user queries and may attempt to distinguish actual user commands from incorrectly entered information. As can be appreciated, however, the type of information that may be entered for a query to a search engine may be quite different in structure or form than typically employed in a word processing application. Thus, tools that check words on a somewhat individual and isolated basis in a word processor application may have little or no utility when applied to information generated from general query data.
For example, browser or other search queries for information present a unique problem for spell checking applications, since the queries often consist of words that may not be found in a standard spell-checking dictionary, such as artist, product, or company names. Another problem is that an in-dictionary word may have been entered incorrectly in place of another intended word (for example, “and processors” instead of “amd processors”). Input queries often relate to current events, new or little-used proper nouns (e.g., new product names, people, institutions, and locations that enter the spotlight on a daily basis), such as “hanging chad” and “Apolo Anton Ohno”, for example. Additionally, the phrases used as input queries are often very different from those encountered by word processors. Accordingly, both the form and content of input queries present unique problems that standard spell checkers (i.e., those utilizing a dictionary) are not configured to handle.
One possible approach is to utilize substring matching techniques on a log of what users are typing into a particular location, such as a search engine or language processor. Unfortunately, a problem with this approach is that the query logs will generally also contain a large number of input errors and return substring matches that are not relevant to a user's desired search.
Another approach, uses query logs containing user queries to a data collection to generate a query based spell checker or query error model to facilitate spell checking input queries to the data collection. The disclosed method utilizes statistical occurrence data extracted from search query logs to generate possible alternative spellings for the input search query strings directed to the data collection. The method accounts for substrings that may not be found in a lexicon, but are still acceptable as a search query of interest.
Although such query log based spell checker is a useful tool in reducing input query errors to the data collection, it relies upon a large amount of query log data to suggest alternative, more appropriate, input queries to the data collection. Unfortunately, the required query logs are unavailable prior to the release of a new online service directed to a new data collection. As a result, the new online service would lack the query log based spell checker until sufficient query log data is generated by the users of the service.
Embodiments of the present invention provide solutions to these and other problems, and offer other advantages over the prior art.