1. Technical Field
The invention disclosed broadly relates to data processing and more particularly relates to linguistic applications in data processing.
2. Background Art
Text processing word processing systems have been developed for both stand-alone applications and distributed processing applications. The terms text processing and word processing will be used interchangeably herein to refer to data processing systems primarily used for the creation, editing, communication, and/or printing of alphanumeric character strings composing written text. A particular distributed processing system for word processing is disclosed in the copending U.S. patent application Ser. No. 781,862 filed Sept. 30, 1985, now U.S. Pat. No. 4,731,735, entitled "Multilingual Processing for Screen Image Build and Command Decode in a Word Processor, with Full Command, Message and Help Support," by K. W. Borgendale, et al. The figures and specification of the Borgendale, et al. patent application are incorporated herein by reference, as an example of a host system within which the subject invention herein can be applied.
Previous work has described procedures for reducing the number of candidate words that have to be examined relative to a specific misspelled word to find a list of the best matched candidate words. One technique looks only at those words that differ in length by less than two characters and which retain the same initial character. Another technique uses a vector fetch approach which assigns each word in the dictionary a magnitude value based on the confusability of the characters in the word and only those words within a specific magnitude range of the misspelled word are retrieved. These techniques have been supplemented by double indexing ambiguous or silent first letters (e.g., phonograph under "P" and "F," knight under "K" and "N") to improve their performance in standard office environments.
Independent of these spelling aid techniques, statistical methods for determining similarities between strings have been developed and even implemented as integrated circuits. Methods such as the SOUNDEX system have been used to cluster names with similar phonetic characteristics to provide candidate file entries that then have to be screened manually for relevance.
Although these methods provide sets of candidate words, they have not integrated the morphological and phonetic components of language and, therefore, the candidates that they produce may be irrelevant or ranked in implausible order.