1. Field of the Invention
This invention relates to optical character recognition. More particularly, this invention relates to methods and apparatus for recognizing characters printed by hand or machine with high accuracy and speed at relatively low cost.
2. Discussion of the Prior Art
It is conservatively estimated that the data input segment of the data processing industry is a $25 billion per year business. A character recognition device saving half this cost would accordingly save more than $10 billion a year. Consequently, there has been for many years very substantial interest in automatic recognition of characters. Recent years have seen substantial success in recognition of machine-printed characters, e.g., typed or computer printed characters, and useful equipment is now commercially available. However, a vast amount of data is still collected by hand, e.g., on census forms, tax forms, hand printed envelopes and the like. It would be highly advantages if similar equipment could be developed for recognizing hand printed characters.
Relevant work done previously by the inventors and co-workers is reported in a number of papers, as follows: "Self-Organizing Neural Network Character Recognition on a Massively Parallel Computer", Wilson et al, Proceedings of International Joint Conference on Neural Networks, II, pp. 325-329, Jun. 18, 1990; "Analysis of a Biologically Motivated Neural Network for Character Recognition", Garris et al, in Proceedings: Analysis of Neural Network Applications, ACM Press, George Mason University, May 1991; "Methods for Enhancing Neural Network Handwritten Character Recognition", Garris et al, International Joint Conference on Neural Networks", Volume I, IEEE, July 1991; "Massively Parallel Implementation of Character Recognition Systems", Garris et al, report NISTIR 4750 published by the U.S. Department of Commerce (1992); and "Training Feed Forward Neural Networks Using Conjugate Gradients", Grother et al, report NISTIR 4776 published by the U.S. Department of Commerce (1992).
Each of these reports relates to use of digital processing systems for recognizing hand printed characters using Gabor functions. Gabor functions were proposed by Dennis Gabor in 1946. The great utility of Gabor functions in recognition of characters and related image processing functions are discussed by Daugman in "Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters", J. Opt. Soc. Am. A., Volume 2, No. 7, July 1985 and in "Complete Discrete 2-D Gabor Transforms by Neural Networks for Image Analysis and Compression", IEEE Transactions on Acoustics, Speech and Signal Processing, Volume 36, No. 7, July, 1988.
It is of particular interest to compare the two-dimensional receptive-field profiles in the upper third of FIG. 3 of the 1985 Daugman paper, which represent the measurement of the relative sensitivity of optical cells in cat eyes, with the representations of selected two-dimensional Gabor functions shown in the middle third of FIG. 3. As shown in the lower third of FIG. 3, the Gabor filters to a considerable degree provide a mathematical model of vision as optimized by evolution. Hence it is logical to assume, as shown by Daugman, that Gabor functions would be useful filters to extract maximum information from an image to be analyzed by machine.
Mathematically, the Gabor functions in essence encode both spatial and frequency information. Thus, for example, the capital letter H includes two vertical strokes. It is intuitively obvious that the parallel relation of these strokes is useful in recognizing an H. The parallel relation of these two strokes is in essence frequency information, as is their spacing. The orientation of the strokes is spatial information. It is apparent from the work of Daugman and the earlier work of the present inventors and co-workers referred to above that the Gabor functions are more useful in character recognition and other forms of image analysis than other simpler sorts of filters used previously for character recognition.
A further set of functions useful in image and specifically character recognition may be derived using Karhunen Loeve ("K-L") transforms. See Grother, "Karhunen Loeve Feature Extraction for Neural Handwritten Character Recognition", SPIE, Vol. 1709 (April 1992). Functions derived from K-L transforms are similar to Gabor functions in that both encode both frequency and spatial information in an image employing a limited set of functions. That is, when either Gabor or K-L derived functions are correlated with an image, the resulting value is proportional to similarity of the frequency and spatial content of the image and of the function. Both Gabor and K-L derived functions are characterized for the purpose of this application as maximum uncertainty--minimum variance (MUMV) functions. Both K-L and Gabor MUMV functions are "robust" in the sense of being capable of yielding useful results despite minor variations in the shape of images to be recognized. For example, some writers connect the horizontal bar at the top of the character "5" to the body of the character, some do not. This aspect of the robustness characteristic is referred to as "maximum uncertainty". Further, both Gabor and K-L derived functions are also robust in the sense of relying on statistical features present in the entire class of characters. This aspect of robustness is referred to as "minimum variance".
An important distinction between processes for image recognition employing Gabor functions and K-L derived functions lies in the method whereby the functions are derived. Gabor functions are derived a priori, that is, by mathematical calculation based on one or more of an infinite set of equations proposed by Gabor. See the May 1991 Proceedings: Analysis of Neural Networks Applications paper of Garris et al, supra. K-L transforms, by comparison, are derived only through correlation of a large sample of actual images of, e.g., the characters to be recognized. See the 1992 Grother SPIE paper, supra. Thus the steps performed to generate a set of Gabor functions are very different from those required to generate a set of K-L functions, although thereafter the functions may be employed similarly.
Reference herein to MUMV functions should be understood to include both Gabor and K-L derived functions, except where the context (e.g., by reference to the method of their generation) clearly indicates the contrary.
Fourier transforms have been employed for image recognition, including character recognition. Fourier transform processing of an image provides a series of coefficients representing the power and relative phase of various frequency components present in the image. Fourier coefficients thus derived encode both frequency and spatial information. However, a complete set of Fourier coefficients may typically include some 1024 coefficients; this large amount of data requires prohibitive amounts of digital processing to yield useful image recognition. Accordingly, the set of coefficients is usually truncated at 32 or 64; consequently much useful information--usually the phase information--is lost, resulting in significant loss of accuracy and utility.
For example, U.S. Pat. No. 4,989,257 to Horowitz discusses application of Fourier transforms for character recognition or similar image processing problems. Horowitz discloses dividing an image of a character into "eight rings and 24 slices" and carrying out "various Fourier transforms, autocorrelations, movement calculations, and sorting operations on the resulting data." See the Abstract. It is self-evident that such a process would require an immense amount of computer time to recognize a single character.
Other functions which have previously been employed for extraction of features from images and related purposes such as character recognition include Walsh functions. Walsh functions are essentially black-and-white "checkerboard" patterns which may be correlated with an image to be characterized. Walsh functions detect "spectral", i.e., frequency information, and may locate a single "topological feature." See U.S. Pat. No. 4,590,608 to Chen et al. The binary--that is, black or white--nature of the Walsh transforms renders their use unduly sensitive to lateral and vertical displacement of the image.
Other patents which may be of interest include U.S. Pat. No. 4,082,431 to Ward, III, showing carrying out Fourier transforms of images using optical holograms and incoherent light; U.S. Pat. No. 3,879,605 to Carl et al, showing a Walsh transform computer implementing Kronecker-matrix transformations; U.S. Pat. No. 5,047,968 to Carrington et al, showing a system for using Fourier transforms to correct distortion in images; and U.S. Pat. No. 5,050,220 to Marsh et al, disclosing an optical fingerprint correlator employing digital Fourier transform techniques to measure the correlation between an unknown print and a sample or known print for characterization purposes.
U.S. Pat. No. 4,854,669 to Birnbach et al discloses a spatial filter with selectable modulation transfer function to remove unwanted portions of an image employing Fourier transform techniques. As indicated above, to obtain accurate image recognition employing Fourier-transform techniques requires that frequency and phase information in the image be preserved. To do so optically requires an apparatus manufactured to high tolerance, employing a coherent light source (e.g., a laser) and a number of costly optical elements. Such systems, as exemplified by Birnbach, are too complex and costly for practical use.
It will be appreciated from review of the above documents that optical techniques have been used to correlate Fourier transforms with characters or other elements of images to be recognized, while digital computers have similarly been used to perform Walsh and Fourier transform filtering. However, the art does not teach employment of MUMV functions, such as Gabor or K-L derived functions, other than in digital systems.
The process of correlating an image of a character to be recognized with a MUMV function to yield a correlation coefficient, e.g. for input to a neural network, as described in the papers of the inventors and co-workers described above, has always (to the knowledge of the present inventors) been carried out generally according to the following steps. A character to be recognized is identified, e.g., by locating it within a particular box on a form to be converted from hand printed hard copy to computer data. The character is digitized by generating an array of bits each responsive to the density of the corresponding pixel in the image. The value of each pixel of the image is then multiplied by the corresponding local value of a MUMV function, that is, by the corresponding value in a second matrix wherein each element of the matrix represents the local value of the MUMV function. The results of all these multiplications (which may be 10,000 or more multiplications) are summed to yield a value for the correlation of the image of the character to be recognized with the MUMV function. A similar set of calculations is carried out with respect to each of the complete set of MUMV functions selected.
The result is a series of coefficients, each representing the correlation of the character with one of the set of MUMV functions. These coefficients are then supplied as inputs to a neural network. In the neural network the coefficients are weighted in accordance with previously calculated data and summed, yielding a set of output values identifying the character to be identified. These steps, and the step of calculating the weights to be applied to the coefficients, are described in the papers of applicants and co-workers and of Daugman referred to above. Where Gabor functions are to be employed as the MUMV functions, the steps of selecting the Gabor functions to be employed from the infinite set of Gabor functions possible and their calculation may be carried out as described in the Garris et al Proceedings: Analysis of Neural Network Applications paper, supra; if K-L transforms are to be used to calculate K-L derived functions, this may be done as described in the Grother SPIE paper, supra. Each of the papers referred to above is accordingly incorporated herein by reference.
It will be appreciated from the above that the step of correlating an image of a character to be recognized with each of a set of MUMV functions has heretofore always necessitated a vast number of multiplications and summation of the results to yield the corresponding coefficients. The coefficients thus obtained, each corresponding to the correlation of a single character with one of the set of MUMV functions, are then supplied to a neural network for weighted summation. Such enormous quantities of calculations are optimally carried out on very high speed massively parallel computing systems. While very useful results have been obtained, as shown in the papers of the inventors and co-workers referred to above, such equipment is expensive and likely to remain so for the near term. Moreover, even using state-of-the-art massively parallel computers the analysis of hand printed characters still requires substantial processing time.
It would be accordingly be desired to provide a method and apparatus for character recognition realizing the advantages of image processing using Gabor or K-L derived MUMV functions that could implemented without costly high speed parallel processing computer equipment.