1. Technical Field
The invention disclosed broadly relates to data processing systems and more particularly relates to character recognition of images of text.
2. Related Patents and Patent Applications
This patent application is related to the copending U.S. patent application, Ser. No. 07/870,129, filed Apr. 15, 1992, now U.S. Pat. No. 5,251,273, entitled "Data Processing System and Method for Sequentially Repairing Character Recognition Errors for Scanned Images of Document Forms," by T. S. Betts, et al., the application being assigned to the IBM Corporation and incorporated herein by reference.
This patent application is also related to the copending U.S. patent application, Ser. No. 07/870,507, filed Apr. 17, 1992, now U.S. Pat. No. 5,305,396, entitled "Data Processing System and Method for Selecting Customized Character Recognition Processes and Coded Data Repair Processes for Scanned Images of Document Forms," by T. S. Betts, et al., the application being assigned to the IBM Corporation and incorporated herein by reference.
This patent application is also related to the copending U.S. patent application, Ser. No. 07/305,828, filed Feb. 2, 1989, now U.S. Pat. No. 5,140,650, entitled "A Computer Implemented Method for Automatic Extraction of Data From Printed Forms," by R. G. Casey, et al., the application being assigned to the IBM Corporation and incorporated herein by reference.
3. Background Art
The character recognition of images of text is a technology which has been extensively developed in the data processing area. There are many commercially available character recognition computer programs and devices which take an image of alphanumeric text and convert it to a string of alphanumeric coded data characters. Each commercially available character recognition product is usually characterized by its manufacturer as having certain strengths for which is more appropriately suited. Some character recognition programs are excellent at converting images of machine impact printing into coded data character strings, however, fail at converting dot matrix characters. Other character recognition programs are designed for converting handprinted characters either constrained within the outline of a rectangular box or unconstrained handprinted characters, and these particular character recognition programs fail at other types of character forms. The category of character forms such as machine impact printing, dot matrix printing, constrained handprinting or unconstrained handprinting and the like are considered variations of character forms. Another category of variation in text for which some character recognition programs are more accurate than others, is the category of field types. Field types are for example an all numeric field, or an all alphabetic field consisting entirely of uppercase letters, or alternately an all lowercase alphabetic field, or still another type is a mixed alphabetic field of some capital letters and some lowercase letters, and the like. Some character recognition programs are much stronger at accurately converting numeric fields, than they are at converting alphabetic or mixed fields of characters. The reason for the limited ability of character recognition programs to perform well on a wide variety of character forms and field types, is that character recognition programs are typically based on a single or limited number of character recognition algorithms, such as pattern recognition, neural net, character feature, or other character recognition algorithms.
This limitation in the diversity of character forms and field types for which existing character recognition programs are useful, creates a problem when a variety of text forms is to be analyzed. What is needed is a method to overcome the weaknesses of single character recognition products, so as to enhance the overall performance of a system which must analyze a wide variety of character forms and field types.