1. Field of the Invention
The present invention generally relates to intelligent character recognition (ICR) systems, and more specifically to an apparatus and method for fusing the results of multiple ICR systems to reduce the system's error rate.
2.Description of the Related Art
There are many instances in which printed text (hardcopy) must be converted to computer readable text (softcopy). For example, tax, census and insurance data are usually provided on standardized forms. Traditionally, a key punch operator would input the data from the forms into the computer so that the data could be processed and compiled into a data base. Occasionally entire documents will have to be entered into a computer. This is very slow and tedious work that leads to operator error.
In theory, ICR systems can be used to automatically recognize printed text (machine or handwritten) and convert it into computer readable text, e.g. an ASCII character format that is compatible with word processors such as Microsoft Word.RTM. or WordPerfect.RTM.. However, in practice the problem of recognizing and accurately discriminating printed text is very difficult. The result is ICR systems with high error rates that still require key punch operators to input a significant portion of the data.
To improve error rates and reduce the amount of data the key punch operator must enter, ICR systems are custom designed for specific problems such as processing standardized forms and reading mailing addresses. These types of problems are much easier because the physical location and classification of the printed text are known. For example, a car insurance form may have designated boxes for a person's name, address and occupation and the car's make and model. The ICR system knows approximately where to look for the text and can use different specially designed databases for each of the classes of printed text. The databases use a limited dictionary of words or numbers, which increases the confidence with which a word or "character string" is selected. As a result, both the ICR system's error rate and the necessary user interaction are reduced.
As used in ICR systems, the term "confidence value" reflects a subjective belief that a character or string matches its counterpart in the original document, and is typically assigned a value between 0 and 1. The confidence value is not a rigorous mathematical indicator, but in general a high confidence value will tend to indicate a lower probability of error and a low confidence value will tend to indicate a higher probability of error.
A number of companies including Mitek Systems, Inc., Nestor, Inc., Bell & Howell, Inc., Calera, Inc., Matra, Inc., and AEG, Inc. produce proprietary ICR systems that recognize hand or machine printed numeric, upper case alpha, upper/lower case alpha, lower case alpha, and punctuation marks. An optical scanner is used to digitally scan a document to create a digital image. The printed text in the document is represented as pixel values in the digital image. The ICR system segments the pixel values first into strings of image components (words) and then into the individual image components (characters). Once segmented, the ICR system recognizes each individual image component as multiple candidate characters having associated confidence values. The ICR system then regenerates candidate character strings from the candidate characters and assigns each string a confidence level. Commercial-off-the-shelf ICR systems output the string with the highest confidence value. If the confidence value exceeds a threshold, the string is accepted and sent to the data base. Otherwise, the string is rejected and the key punch operator is prompted to visually identify the string from the digital image and enter it into the computer.
To reduce the error rate, we believe that many of these systems use an internal fusion process to combine the outputs of multiple different ICR systems. This allows each ICR system to detect a limited character subset of the complete ASCII character set. This improves the accuracy of each ICR system. Alternately, multiple ICR systems that detect the same subset but which are provided by different vendors, and hence operate off of different recognition kernels, can be combined to improve performance.
Because these systems are proprietary, we do not know the details of the individual ICR systems or the specific fusion algorithms. However, we believe that the existing ICR systems use one of four known approaches: 1) a voter system, 2) probability aggregation, 3) belief combination or 4) fuzzy logic. In a voter system all of the ICR systems are treated as equal and the string that occurs in a majority/plurality of the outputs is selected. In a probability aggregation approach, the relative weight given to each ICR system is based on its average performance. In belief combination, each ICR system provides a range of "possibility" (belief) assignments as opposed to a fixed probability assignment. In fuzzy logic words such as "somewhat" or "a lot" are used to describe the ICR systems' relative strengths. The fuzzy network then converts these words into numeric values to make a decision. Although these fusion methods generally improve the system's error rate, more improvement is needed to reduce the error rate even further and make ICR systems commercially feasible.