The present invention relates to automated spell checking. In particular, the present invention relates to suggesting contractions as alternatives to misspelled words.
The role of the spell checker is to identify spelling mistakes and to offer suggestions to correct them. Typically, such spell checking is performed by comparing an input string to entries in a lexicon. If an entry is found that matches the word, the spell checking system assumes that the word is correctly spelled. If no matching word can be found, the spell checker returns a suggestion of possible words in the lexicon that the user may have been trying to spell. These words are selected by comparing the input string to each word in the lexicon and identifying those words that are closest to the input string.
The distance between the input string and a word in a lexicon is measured differently by different spell checkers but in general is based on the number of deletions, additions, permutations or substitutions that must be performed on the word in the lexicon to form the misspelled word. Words that are further than some threshold distance from the input string are not suggested to the user as alternatives because it is unlikely that the user has made so many spelling mistakes and suggesting words that are substantially different from the input string can cause users to lose confidence in the system.
While such spell checkers have worked well in many languages, they have produced a large number of poor suggestions for French contractions that are missing an apostrophe.
In French, it is common to form a contraction by shortening a word such as “le” into an elided form such as “l” and appending the elided word to the next word in the sentence using an apostrophe. For example, instead of writing “le arbre” (the tree) the contraction “l'arbre” is used.
One common mistake is to forget the apostrophe when forming the contraction during writing. Thus, for example, the user may type “lorange” when the correct form is “l'orange”.
Although it is possible to form all of the contractions as lexical entries, this would cause the lexicon to be very large. For example, there would need to be an entry for “orange” and for “l'orange”. To avoid this, systems have attempted to dynamically identify when an apostrophe may be missing from a contraction. However, these systems have typically used simple rules that identify a first letter or set of letters as belonging to a set of elided words, and then simply suggesting placing an apostrophe between the identified letters and the remaining portion of the word.
These simple rules have resulted in a large number of meaningless suggestions being provided to the user since a large number of the contractions formed by the rules are grammatically incorrect.
Thus, a spell checker is needed that provides grammatically correct suggestions to the user when suggesting contractions as possible spelling alternatives to words provided by the user without requiring each individual contraction to be found in the lexicon.