Text entries, such as those in documents generated using a word processing application, can contain many different types of errors including spelling errors. Spelling errors that result in invalid words can generally be handled by a lexicon-based spell checker. Such misspellings may occur due to a typo or an ignorance of the spelling of the word.
Lexicon-based spell checkers compare the words in the text entry to a lexicon of words and identify the words in the text entry that are not found in the lexicon. One or more replacement words are often suggested for the misspelled word. For example, in the text entry “fly frm Boston”, the spell checker would identify “frm” as being misspelled.
Other types of misspellings result in valid words that are generally not detectable using traditional spell checking applications. For instance, an unintended valid word may be entered by a user of the word processing application as a result of a typo or an ignorance of the spelling of the intended word. For example, in a text entry “fly form Boston”, the word “form” is a valid word that would not be flagged by conventional spell checking applications, even though the word is a misspelling of the intended word “from”. The correction of these types of misspellings generally requires an analysis of the context in which the word is used.
Traditional spell checking applications generally base the suggested replacement words for identified invalid words on an edit distance. The edit distance represents the change that is required to form a valid alternative word. The word in the lexicon having the shortest edit distance from the typed invalid word is the first replacement word that is suggested to the user. For example, in the phrase “fly frm Boston” most spell checking applications would suggest “form” as the replacement word before the correct word “from” is suggested, because the context of the word is not taken into account when making the suggestion. In order to suggest the most appropriate replacement word for the misspelling, an analysis of the context in which the misspelling is found must be made.
Accordingly, a need exists for improved spell checking methods and systems that are capable of analyzing the context in which the words are used to provide better suggestions for misspelled words and improved detection of valid words that are used improperly.
Embodiments of the present invention provide solutions to these and other problems, and offer other advantages over the prior art.