1. Field of the Invention
The present invention generally relates to a very fast method for correcting the spelling of a word or phrase in a document. The method has application to any technique which searches documents.
2. Background Description
Suppose that we are given a word, G, and we wish to find one or more other words from a list of candidate words that are within a given edit distance from G. Here, the edit distance between two words is the smallest number of operations which transform the candidate word into the given word (each operation consisting of removing one letter, adding one letter, replacing one letter with another letter, or transposing two letters).
In xe2x80x9cThe String-to-String Correction Problemxe2x80x9d, JACM, 21(1), pp. 168-173 (1974), R. A. Wagner and M. J. Fischer showed that the edit distance between two words G and C could be computed in time proportional to the length of G times the length of C. Subsequently, in xe2x80x9cAlgorithms for Approximate String Matchingxe2x80x9d, Information and Control, 64, pp. 100-118 (1985), E. Ukkonen improved the running time of the algorithm. This latter result is called the xe2x80x9cslow methodxe2x80x9d in the following description.
According to the invention, there is provided a method which proceeds in two steps: first applying a very fast method (comparing G to each candidate word) for eliminating most candidate words from consideration (without computing the exact edit distance between G and any candidate word), followed by the xe2x80x9cslow methodxe2x80x9d which computes the exact edit distance between G and each of the few remaining candidate words. The second step is the slow exact method well-known in the art as described by Ukkonen.
The invention consists of (1) a new fast approximate method and (2) combining this fast approximate method with the slow method. The combination results in a method that is almost as fast as the fast approximate method and as exact as the slow method.