The present invention relates to electrical communications, and particularly concerns systems for the machine recognition of patterns such as lexical characters.
Machines for the recognition of characters and other patterns are subject to two types of errors. Reject errors are those for which the machine is incapable of placing the input character into any of the possible classes. A rejected character is usually represented by a special code or symbol, such as "at". Substitution errors are those for which the machine places the character in an incorrect class, such as the improper recognition of a "B" as an "8". Within a recognition unit of a given size and complexity, the relative number of substitutions may be minimized by rejecting all input characters which are not recognized with a high degree of confidence. Since this approach increases the relative number of reject errors, it is said to have a high reject/substitution ratio. Conversely, a low reject/substitution ratio may be achieved by allowing non-confident guesses as to the identity of the input character. The designer of a recognition system may choose any reject/substitution ratio between these extremes as a parameter of his system. Once chosen, however, it is immutable except by redesign of the system.
The problem with the choice of a fixed reject/substitution ratio is that no single ratio is optimum for different applications of a recognition system. In reading monetary amount and account number character fields, for instance, it is usually desirable to minimize substitutions at the expense of a higher reject rate. The amount of redundancy in such fields is usually low, and the consequences of mistaken recognition are usually more serious. In reading connected text and non-critical information, however, the expense of manually correcting reject errors may be reduced by allowing a higher relative substitution rate. The redundancy of normal English text, for example, is sufficiently large that occasional incorrect characters are of little concern to intelligibility. Moreover, present-day context-recognition devices are capable of automatically correcting many substitution errors, especially when they are presented with some indication as to the possible identity of a character.
This problem has been addressed in the past. In the IBM 1287 Optical Reader, provision is made for selectively rescanning critical character fields and comparing the identifications with each other. If they differ, the character is rejected. In all other fields, only one identification is produced. The difficulty with this approach is that the same recognition logic is used for all recognition attempts, and the rescanning operation imposes a considerable time penalty.