1. Field of the Invention
This invention relates to optical character recognition. More particularly, this invention relates to methods and apparatus for recognizing characters printed by hand or machine with high accuracy and speed at relatively low cost.
2. Discussion of the Prior Art
It is conservatively estimated that the data input segment of the data processing industry is a $25 billion per year business. A character recognition device saving half this cost would accordingly save more than $10 billion a year. Consequently, there has been for many years very substantial interest in automatic recognition of characters. Recent years have seen substantial success in recognition of machine-printed characters, e.g., typed or computer printed characters, and useful equipment is now commercially available. However, a vast amount of data is still collected by hand, e.g., on census forms, tax forms, hand printed envelopes and the like. It would be highly advantages if similar equipment could be developed for recognizing hand printed characters.
Relevant work done previously by the inventors and co-workers is reported in a number of papers, as follows: "Self-Organizing Neural Network Character Recognition on a Massively Parallel Computer", Wilson et al, Proceedings of International Joint Conference on Neural Networks, II, pp. 325-329, Jun. 18, 1990; "Analysis of a Biologically Motivated Neural Network for Character Recognition", Garris et al, in Proceedings: Analysis of Neural Network Applications, ACM Press, George Mason University, May 1991; "Methods for Enhancing Neural Network Handwritten Character Recognition", Garris et al, International Joint Conference on Neural Networks, Volume I, IEEE, July 1991; "Massively Parallel Implementation of Character Recognition Systems", Garris et al, report NISTIR 4750 published by the U.S. Department of Commerce (1992); and "Training Feed Forward Neural Networks Using Conjugate Gradients", Grother et al, report NISTIR 4776 published by the U.S. Department of Commerce (1992).
Each of these reports relates to use of digital processing systems for recognizing hand printed characters using Gabor functions. Gabor functions were proposed by Dennis Gabor in 1946. The great utility of Gabor functions in recognition of characters and related image processing functions are discussed by Daugman in "Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters", J. Opt. Soc. Am. A., Volume 2, No 7, July, 1985 and in "Complete Discrete 2-D Gabor Transforms by Neural Networks for Image Analysis and Compression", IEEE Transactions on Acoustics, Speech and Signal Processing, Volume 36, No. 7, July, 1988.
A further set of functions useful in image and specifically character recognition may be derived using Karhunen Loeve ("K-L") transforms. See Grother, "Karhunen Loeve Feature Extraction for Neural Handwritten Character Recognition", SPIE, Vol 1709 (April 1992). Functions derived from K-L transforms are similar to Gabor functions in that both encode both frequency and spatial information in an image employing a limited set of functions. That is, when either Gabor or K-L derived functions are correlated with an image, the resulting value is proportional to similarity of the frequency and spatial content of the image and of the function.
An important distinction between processes for image recognition employing Gabor functions and K-L derived functions lies in the method whereby the functions are derived. Gabor functions are derived a priori, that is, by mathematical calculation based on one or more of an infinite set of equations proposed by Gabor. See the May 1991 Proceedings: Analysis of Neural Networks Applications paper of Garris et al, supra. K-L transforms, by comparison, are derived only through correlation of a large sample of actual images of, e.g., the characters to be recognized. See the 1992 Grother SPIE paper, supra. Thus the steps performed to generate a set of Gabor functions are very different from those required to generate a set of K-L functions, although thereafter the functions may be employed similarly.
A further class of functions useful in character recognition and relevant to the invention disclosed and claimed herein are referred to as positive and negative correlative functions ("PNCFs") which encode both pattern and relevance information. These terms are generally understood in the art as follows. The "pattern" of a particular function refers to a fundamental shape useful in recognizing a particular class of characters. Such a pattern function may be implemented mathematically as a two dimensional array, each element corresponding to a pixel of an image of the pattern, and having a "1" or "0" value depending on whether that pixel is within the pattern. Thus, for example, an image of a "pattern" function useful in recognizing the character "5" includes a shape recognizable to the eye as generally corresponding to the shape of a "5". The idea is that substantially all handwritten or printed "5" characters will coincide to a considerable degree with this pattern function. Thus, a complete set of pattern functions for use in recognizing the ten numerals would include at least ten patterns corresponding to the ten digits.
In most cases such a set of patterns would include additional pattern functions reflecting general similarities and differences in the shapes of characters. For example, some individuals connect the top of the two vertical strokes of the number "4", while others write them more nearly parallel. Accordingly, two pattern functions generally corresponding to these two ways of writing a "4" might typically be found in a complete set of pattern functions for recognizing numbers.
A second class of functions useful in recognizing characters are referred as relevance functions. In this case each pixel of a two dimensional array representing a relevance function includes a number representing a gray scale value corresponding to the probability of that particular pixel being present in an image of a character to be recognized. The values of the pixels may be determined by summing the corresponding pixels of images of a large number of test characters. Thus, for example, the pixels corresponding to the most commonly present pixels of characters--e.g., those found in the pattern function for the number "5"--would have relatively high values, corresponding to the pattern information. Pixels outside those highly-valued pixels would have a zero value in the pattern function, but in the relevance function the values of the less relevant pixels gradually decrease. Accordingly, the value of each bit of a pattern function is binary, that is, either a particular pixel is or is not present in the pattern function corresponding to the image of the character. By comparison, in a relevance function each pixel is assigned a value representative of the probability of that pixel being present in the image of the corresponding character.
Such pattern and relevance functions are known to be useful in identifying characters, by mathematical correlation of the functions with appropriately-scaled images of the characters to be recognized. See commonly assigned copending Ser. No. 07/701,484 filed May 16, 1991 and incorporated by reference herein and now abandoned.
The pattern and relevance information can also be encoded in the negative sense, i.e., the pattern information can be inversely represented to indicate that certain pixels should not be present in an image of the corresponding character, such as the pixels in the outer corners of the image. Similarly, the relevance functions can be inverted, so that the presence of certain pixels in an image counterindicates the identification of the corresponding character. Such negative functions are compared to inverted images of the characters to be recognized.
Previously, comparison of images of characters with such pattern and relevance functions for character recognition has been performed mathematically, by calculating the correlation of each character to be recognized with each of the pattern and relevance functions, such that the maximum correlation calculated identifies the character. See, e.g., U.S. Pat. No. 4,998,286 to Tsujiuchi. See also U.S. Pat. No. 3,182,290 to Rabinow, recognizing that processing time can be saved by treating characters of similar shape (e.g., C's, O's, G's and Q's) similarly insofar as possible, then separately analyzing their distinguishing features; and U.S. Pat. No. 4,783,830 to Johnson et al, showing a pattern-recognizing content addressable memory system for a network of processors.
The present invention relates to a system for carrying out pattern recognition based on comparison of an image of a character to be recognized with positive pattern and positive relevance information, and for comparing an inverse of the image of the character to be recognized with negative pattern and relevance information. As mentioned the functions employed are referred to herein as positive and negative correlative functions (PNCFs). A set of PNCFs will typically include four such functions, both positive and negative pattern and relevance functions, as indicated. Ser. No. 07/701,484 referred to above provides a thorough treatment of mathematical methods of deriving and using such PNCFs. Such methods are also fully discussed in Wilson, "FAUST: A Vision-Based Neural Network Multi-Map Pattern Recognition Architecture", report NISTIR 4805 published by the U.S. Department of Commerce. The present invention is directed to a simpler method of determining and employing such PNCFs.
To complete the discussion of the prior art, Fourier transforms have been employed for image recognition, including character recognition. Fourier transform processing of an image provides a series of coefficients representing the power and relative phase of various frequency components present in the image. Fourier coefficients thus derived encode both frequency and spatial information. However, a complete set of Fourier coefficients may typically include some 1024 coefficients; this large amount of data requires prohibitive amounts of digital processing to yield useful image recognition. Accordingly, the set of coefficients is usually truncated at 32 or 64; consequently much useful information--usually the phase information--is lost, resulting in significant loss of accuracy and utility.
For example, U.S. Pat. No. 4,989,257 to Horowitz discusses application of Fourier transforms for character recognition or similar image processing problems. Horowitz discloses dividing an image of a character into "eight rings and 24 slices" and carrying out "various Fourier transforms, autocorrelations, movement calculations, and sorting operations on the resulting data." See the Abstract. It is self-evident that such a process would require an immense amount of computer time to recognize a single character.
Other functions which have previously been employed for extraction of features from images and related purposes such as character recognition include Walsh functions. Walsh functions are essentially black-and-white "checkerboard" patterns which may be correlated with an image to be characterized. Walsh functions detect "spectral", i.e., frequency information, and may locate a single "topological feature." See U.S. Pat. No. 4,590,608 to Chen et al. The binary--that is, black or white--nature of the Walsh transforms renders their use unduly sensitive to lateral and vertical displacement of the image.
Other patents which may be of interest include U.S. Pat. No. 4,082,431 to Ward, III, showing carrying out Fourier transforms of images using optical holograms and incoherent light; U.S. Pat. No. 3,879,605 to Carl et al, showing a Walsh transform computer implementing Kronecker-matrix transformations; U.S. Pat. No. 5,047,968 to Carrington et al, showing a system for using Fourier transforms to correct distortion in images; and U.S. Pat. No. 5,050,220 to Marsh et al, disclosing an optical fingerprint correlator employing digital Fourier transform techniques to measure the correlation between an unknown print and a sample or known print for characterization purposes.
U.S. Pat. No. 4,854,669 to Birnbach et al discloses a spatial filter with selectible modulation transfer function to remove unwanted portions of an image employing Fourier transform techniques. As indicated above, to obtain accurate image recognition employing Fourier-transform techniques requires that frequency and phase information in the image be preserved. To do so optically requires an apparatus manufactured to high tolerance, employing a coherent light source (e.g., a laser) and a number of costly optical elements. Such systems, as exemplified by Birnbach, are too complex and costly for practical use.
It will be appreciated from review of the above documents that optical techniques have been used to correlate Fourier transforms with characters or other elements of images to be recognized, while digital computers have similarly been used to perform Walsh and Fourier transform filtering. The art does not teach employment of PNCFs, other than in digital systems as described in copending Ser. No. 07/701,484, incorporated herein by reference.
The process of correlating an image of a character to be recognized with a set of PNCFs to yield a set of correlation coefficients, e.g. for input to a neural network, as described in Ser. No. 07/701,484, has always (to the knowledge of the present inventor) been carried out generally according to the following steps. A character to be recognized is identified, e.g., by locating it within a particular box on a form to be converted from hand-printed hard copy to computer data. Each character is digitized by generating an array of bits each responsive to the density of the corresponding pixel in the image. The value of each pixel of the image is then multiplied by the corresponding local value of each of a number of sets of PNCFs, that is, by the corresponding value in a second matrix wherein each element of the matrix represents the local value of one of the PNCFs. The results of all these multiplications (which may be 1,000 or more multiplications per PNCF) are summed to yield a value for the correlation of the image of the character to be recognized with the PNCF. A similar set of calculations is carried out with respect to each of the complete set of PNCFs selected.
The result is a series of coefficients, each representing the correlation of the image of the character with one of the PNCFs. As indicated above, at least one set of four PNCFs is normally generated for each character. Variations in individual styles of writing individual characters--for example, some individuals put a slash through "zero" characters, some put a bar in "seven" characters, and so on--may necessitate two or more sets of PNCFs for each character to be recognized. Hence identification of any character requires the generation of substantial number of coefficients, each corresponding to one of the PNCFs. The coefficients are then supplied as inputs to a neural network. In the neural network the coefficients are weighted in accordance with previously calculated data and summed, yielding a set of output values identifying the character to be identified. These steps, and the step of calculating the weights to be applied to the coefficients, are described in the papers of applicant and co-workers referred to above.
It will be appreciated from the above that the step of correlating an image of a character to be recognized with each PNCF of a large number of sets of PNCFs has heretofore always necessitated a vast number of multiplications and summation of the results to yield the corresponding coefficients. The coefficients thus obtained, each corresponding to the correlation of a single character with one of the PNCFs, are then supplied to a neural network for weighted summation. Such enormous quantities of calculations are optimally carried out on very high speed massively parallel computing systems. While very useful results have been obtained, as shown in the papers of the inventors and co-workers referred to above, such equipment is expensive and likely to remain so for the near term. Moreover, even using state-of-the-art massively parallel computers the analysis of hand printed characters still requires substantial processing time.
It would accordingly be desired to provide a method and apparatus for character recognition realizing the advantages of image processing using PNCFs that could be implemented without costly high speed parallel processing computer equipment.