This invention pertains to symbol or character recognition systems for use in entering data into computer data processing systems.
In present day technology, the information contained in documents is often required to be stored in a form suitable for processing by high speed electronic data processing systems. One method of storing the document information requires that the information which, typically, might be in the form of characters or symbols of a particular font, be read from the document, recognized and, thereafter, stored in a machine readable code for further automatic processing. Reading and recognizing of the document information must necessarily be carried out with a high degree of accuracy, regardless of document character quality and/or the occurrence of multiple document characters (i.e., underlined characters, accented or otherwise overstruck characters).
To this end a wide variety of systems have been proposed to perform these functions. In many of these systems, each document region containing an unknown character, after suitable alignment, is scanned or read by a scanning system to form an analog signal corresponding to the scanned document region. This analog signal is transformed into a digital signal, each of whose bits corresponds to an incremental time span of the analog signal and, therefore, to an associated incremental document area. Each bit of the digital signal is assigned a first state if its associated incremental document area contains a character segment (usually black) and a second state if its associated document area contains background (usually white). The digital signal is read into a shift register or other storage element to form a digital matrix respresentation of the document region. Recognition of the unknown chararacter is carried out by comparing the digital matrix representation with other representations indicative of reference characters corresponding to the characters usable on the document. Recognition occurs when a comparison with a particular reference character results in a minimum discrepancy or difference.
U.S. Pat. No. 3,233,219 discloses one system of this type wherein each reference character is defined in terms of a matrix representation each of whose positions is assigned probabilities corresponding to the probabilities of its associated incremental area containing a character segment or background. The unknown character is transformed into a digital matrix representation, as above-described, such that the character is situated in horizontal registration in the matrix. The matrix positions are interrogated and the states of the positions determined and used in conjunction with the reference character matrices to recognize which reference character is most probably represented by the unknown character. Recognition is carried out by using the determined state of a matrix position to simultaneously trigger probabilty values for that state associated with corresponding matrix positions of the reference characters. The probability values associated with the matrix positions of the reference characters are represented by a resistance matrix having resistance values inverse to the logarithms of their corresponding probabilities. When a determined state of a particular matrix position triggers its corresponding probabilities, respective currents related to these probabilities are developed by the corresponding resistances for the reference characters. Each current is summed with previous currents determined for other matrix positions for that reference character. After all matrix positions have been interrogated, resultant sum currents are obtained for the respective reference characters. The reference character having the smallest sum current is indicative of the character most likely represented by the unknown character.
The U.S. Pat. No. 3,233,219 further contemplates a procedure to compensate for vertical misregistration of the unknown character in its matrix representation by incrementally moving the character through the matrix vertically one vertical position and repeating the above procedure of position interrogation and sum current formation. The unknown character is thus moved vertically through the matrix, and a sum current obtained for each reference character for each vertical position. The sum currents for each reference character are stored and the lowest sum current for a particular reference character determined as representative of that reference character. The reference character whose representative sum current is lowest is then determined as having the highest probability of being the unknown character.
Finally, the U.S. Pat. No. 3,233,219 contemplates a reduction in the number of reference character matrix positions whose probabilities need be considered in the recognition procedure. The patent suggests limiting the matrix postions to only those which, from a large sampling of a particular reference character, are found to be always of the same state. Thus, the probabilities of character matrix positions found to substantially always contain a character segment or background are considered in this form of the recognition system. This reduces the number of resistances in the resistance matrix. A further reduction in the resistance matrix is also suggested through elimination of resistances indicative of high probability states of the matrix positions. In this form, the only resistances utilized to contribute to the sum current for each reference character are those indicative of the low probability state of a matrix position.
It is an object of the present invention to provide an improved character recognition system and method.
It is a further object of the present invention to provide a character recognition system of improved accuracy which is insensitive to character quality and multiple characters.