A number of techniques exist for automatically detecting and correcting spelling errors. Suppose that a spell checking algorithm is given a word, G, and attempts to find one or more other words from a list of candidate words (such as validly spelled words) that are within a given edit distance from G. The edit distance between two words is the smallest number of operations that transform the candidate word into the given word (with each operation consisting of removing one letter (deletion), adding one letter (insertion), replacing one letter with another letter (replacement), or transposing two letters (transposition)).
Two words are said to have a distance (or “edit distance”) of zero between them if they are identical. The two words are said have a distance one separation if one can get from one word to the other word, by: (1) transposing one pair of adjacent characters; (2) replacing a single character with any other character; (3) deleting any one character; or (4) inserting an arbitrary character at any position in the original word. Likewise, words are a distance two apart if two moves of the type described above are required to get from the first word to the second word. More generally, two words are a distance N apart if N moves are required to get from the first word to the second.
U.S. Pat. No. 6,616,704 B1, assigned to the assignee of the present invention and entitled “Two Step Method for Correcting Spelling of a Word or Phrase in a Document,” discloses a method for correcting the spelling of a word or phrase in a document. The disclosed method proceeds in two steps: first an initial approximate method eliminates most candidate words from consideration (without computing the exact edit distance between the given word whose spelling is to be corrected and any candidate word), and then a “slow method” computes the exact edit distance between the word whose spelling is to be corrected and each of the few remaining candidate words. For a dictionary of size D and a maximum word length W, the disclosed two step method is said to run in time on the order of (D), if the number of exact edit distance calculations is small, and on the order of (D*W2) otherwise.
While such existing techniques for real-time spelling correction of a term against a dictionary of valid words provide an effective mechanism for detecting and correcting spelling errors, they suffer from a number of limitations, which if overcome, could further improve the efficiency, utility and reliability of spell checking functions. More particularly, a number of existing techniques generate an excessive amount of false positives. In addition, for the detection of certain errors, existing techniques are said to run in time on the order of the dictionary size, D, or on the order of log(D), the log of the size of the dictionary.
A need therefore exists for improved techniques for real-time spelling correction of a term against a dictionary of valid words.