Computer system errors generally fall into three principal categories: central processing errors; peripheral device errors; and programming errors. One type of peripheral device error occurs when one item in a string of data is misread or incompletely read by a character recognition device. For example, character recognition engines are frequently employed to read large strings of alphabetical information such as printed addresses. The character recognition engine can occasionally misread or omit a character contained in the address string. This invention relates to error correction in character recognition systems and in particular to an apparatus and technique for resolving the error by matching the misread character string to similar character strings contained in a lexicon of valid character strings.
Existing hardware and software is unsuitable for resolving, quickly and with a minimal amount of storage space, the type of error described above. Known techniques provide a compromise between search speed and the amount of available operative data. In addition, certain of these known techniques are unsuitable for resolving the type of random error made by a character recognition engine since the known technique assumes the error has certain predefined characteristics.
One existing hardware device is the Fuzzy Set Comparator (FSC) chip manufactured by MicroDevices of 7725 N. Orange Blossom Trail, Orlando, Fla. The FSC chip compares up to eight patterns stored in a random access memory against one input and selects the closest match of the eight. All eight patterns are processed simultaneously and the closeness of each match is not reported. Expansion to 256 stored patterns is possible by adding additional memory hardware. A reference lexicon such as a ZIP code lexicon may contain hundreds or thousands of street names. The FSC chip is thus unsuitable for resolving problems similar to the misread of a character contained in a character string described in paragraph 1, since the lexicon size is severely limited by the hardware limit of 256 entries. Furthermore, the random access memory must be reloaded and rewritten whenever a new lexicon is referenced. This type of device is therefore useful only with a single, relatively small size lexicon.
Other correction devices have similar hardware limitations or assume the errors made have certain likely characteristics. For example, spelling correction algorithms rely on known valid relationships between the characters. Other correction hardware such as the Fast Data Finder manufactured by TRW of Cleveland, Ohio, contains a series of state machine logic which selects partial matches according to predefined criteria. Similarly, neural networks are general-purpose classifiers that precondition the input data so that important properties of the data are represented within the network. In addition, neural network structures require vast amounts of computation and storage to process data and perform the necessary comparisons. These types of error correction systems, because they rely on known or predefined properties of the input, do not address the random errors, independent of position or surrounding characters, likely to be made by a character recognition engine.