1. Field of the Invention
This invention relates to text processing and, more particularly to methods for automatically prompting an operator with the correct spelling of a misspelled word.
2. Description of the Prior Art
In implementing a practical automatic spelling aid system the ultimate number of words that must be examined by a high resolution match algorithm which adjusts for aliased characters and dropped or added characters/syllables determines the practicality of the system in terms of both cost and efficiency of operation.
Procedures have been evolved in the prior art for reducing the number of candidate words that must be examined relative to a target misspelled word to find the best matched candidate or candidates. One technique involves only looking at those words which match the misspelled word in its first character and are not greater or less in length than it by more than two characters. This approach is based on the supposition that the most reliable character in a misspelled word is always the first character and that normal misspelling would not yield more than a two-character addition or deletion.
A second technique for reducing the number of candidates that must be examined relative to a target misspelled word to determine the best matched candidate requires use of a vector fetch approach which assigns to each word in the dictionary a magnitude value based on the confusibility of the characters in the word where a character's weight in the magnitude computation is clustered close to those characters that it could be aliased as. Only those words in the magnitude "range" of the misspelled word are retrieved.
A third technique for reducing the number of candidates that must be examined relative to a target misspelled word to determine the set of best matched candidates is to examine all words of equal length to the misspelled word or within plus or minus two character positions regardless of first character.
However, because the dictionary size may be quite large (i.e., many times over 50,000 words), even a discriminant which will preclude 99% of the dictionary from review will still lead to a large set of words which must be examined to determine the best match candidates relative to a misspelled word. The first technique, although effective, leads to non-recoverable errors when the first character is in error and normally does not have a discrimination potential greater than 90%. The second technique has a higher average discrimination potential using the Cluster Storage Apparatus disclosed in U.S. Pat. No. 3,969,698, but still yields more than 1% of the dictionary for final review. The combination of the first and second techniques with the double storing of words that have highly ambiguous or silent first characters, (e.g., "philosophy" under the "P" and under "F", "knot" under "K" and under "N"); together yields a discrimination potential of roughly 99%. This, however, as mentioned, still leaves for large dictionaries more words than can be conveniently handled in a real-time manner for discrimination of the best candidate matches against a target misspelled word. The further discrimination using an independent criteria not used above is required to additionally reduce the word list down to a size that can be conveniently processed in a real-time manner to determine the best candidate match(es) against the target misspelled word. This problem is further accentuated by the fact that after 99% of the words have been discriminated, the remaining one percent tend to be more homogeneous in content and therefore, less amenable to cursory methods of examination and further culling.
Further candidate word discrimination can be achieved as taught in application Ser. No. 6/108,000, filed Dec. 28, 1979, entitled "Alpha Content Match Prescan Method For Automatic Spelling Error Correction" by D. Glickman, et al., by inventorying, without regard to position, the respective characters in the misspelled word and in each of the dictionary candidate words where a candidate word is dismissed from additional processing if there is not a predetermined percentage match between its character content and that of the misspelled word. This process can be performed upon the set of words resulting from use of said Cluster Storage Apparatus and yield a further factor of 10 reduction in candidate words. Although the candidate word reduction achieved is salutory, the increment in the real time computation requirement is not absolutely minimized.