The present invention relates generally to image analysis of whole words, phrases or numbers, including a Fourier transformation and pattern recognition.
Automation of businesses is dependent on a machine's ability to recognize the input and act according to preprogrammed instructions. Without an input the process of automation is impossible. Unfortunately most inputs are not in a form compatible with automation. An example is the postal service. The average person does not address his letter with a bar code label format. People address their mail with words written on one side of the envelope or package. No two persons' handwriting is the same and no one can write exactly the same way each time. It is not difficult for the human mind to recognize most handwriting but the complexity to build a computer system to do the same has yet to be achieved.
Presently there is no such reading machine which can take a video picture of printed text and identify each word in the picture. The reason for this is the problem of pattern recognition. No two printed words are exactly the same and simply matching the picture of a word to a standard template or model is not sufficient. A one to one correlation is not possible because of all the different types of printing fonts used, the discontinuity within individual characters, the difference in shading between characters, the background noise, and the variation in spacing between characters within words. A pattern recognition system has to identify an input which is similar to a template. An approximation to what is being searched for is possible in some cases but to recognize all the thousands of different words spelled with all the different types of character fonts or handwriting is presently impossible.
Surprisingly somehow, the brain recognizes words each time something is read and recognizes the letters, words, or entire phrases in a single glance, without ever having seen that exact image before. Even with all out super computer technology, it is still not feasible to build a true reading machine that works as well as the human visual system. To build such a machine the computer must model the inputs and categorize them into approximate words or phrases just like the human brain does. How the computer models the inputs will define whether or not it can be used as a reading machine.
Reference for background information and prior studies in this field include:
1. Kabrisky, Matthew--Lecture materials for a course in Pattern Recognition, School of Engineering, Air Force Institute of Technology (AU), Wright-Patterson AFB, Ohio, January 1984.
2. Goble, Larry G., Filtered 2-Dimensional Discrete Fourier and Walsh Transform Correlation with Recognition Errors and Similarity Judgements, Dissertation, Ann Arbor, Michigan: University of Michigan, 1975.
3. Bush, Capt Larry F., The Design of an Optimum Alphanumeric Symbol Set for Cockpit Displays, MS thesis, School of Engineering, Air Force Institute of Technology (AU), Wright-Patterson AFB, Ohio, December 1977 (Defense Technical Information Center (DTIC) No. ADA053-447).
4. Tinker, Miles A., Bases for Effective Reading, Minneapolis, Minn.: University of Minnesota Press, 1966.
5. Simmons, Robin A. Machine Segmentation of Unformatted Characters, MS thesis, School of Engineering, Air Force Institute of Technology (AU), Wright-Patterson AFB, Ohio, December 1981 (DTIC No. ADA115-556).
6. Rodoy, Charles H., Pattern Recognition by Fourier Series Transformations, MS thesis, School of Engineering, Air Force Institute of Technology (AU), Wright-Patterson AFB, Ohio, March, 1967 (DTIC No. AD651-801).
7. Tallman, Oliver H., The Classification of Visual Images by Spatial Filtering, PhD dissertation, School of Engineering, Air Force Institute of Technology (AU), Wright-Patterson AFB, Ohio, June, 1969 (DTIC No. AD858-866).
One approach in solving the reading machine problem is to model how the human visual system categorizes images of words. One view of the Gestalt Theory explains human correlation between the spatially filtered two-dimensional Fourier Transform (2DFT) of the images (reference 1). The human visual system takes information from the retina, transmits it along the optical nerve, to area 17 of the cerebral cortex, and then maps it into area 18 of the cerebral cortex where recognition occurs. The mapping function into area 18 is what Gestalt explains mathematically by the 2DFT. By taking an image, computing the 2DFT of the image, and filtering out everything except the lowest three harmonic terms, the resultant information can be used to categorize the images, words or individual letters into similar known objects. This is what gives the letter `b` its `b-ness`. Even though no two b's are the same, the filtered 2DFT of two different b's correlates higher than either `b` with any other letter.
In 1975, Goble's dissertation concluded that through the use of Euchidian distance matrix, from the filtered DFT's a model of human visual perception could be validated. The distance matrix derived from the 2DFT's of individual letters showed that small Euclidian distances between letters represented a high degree of recognition errors while larger distances represented a low degree of recognition errors. His dissertation gave credibility to the Gestalt Theory (reference 2).
In 1977, Bush's theses compared pyschophysical test results with the Euclidian distance matrix of single letters. Using the 26 letters of the alphabet, each letter was expressed in a 10.times.14 dot-matrix configuration. The filtered 2DFT of each letter was taken and the 25 real and 24 imaginary components of each 2DFT was used to form 49 energy normalized Fourier components. These Fourier components then defined the location in 49 orthogonal space where each letter was to be located (reference 3).