1. Field of the Invention
This invention relates in general to pattern recognition, and relates more particularly to methods and systems for increasing the accuracy and resolution of such pattern recognition.
2. Prior Art
A wide variety of pattern recognition systems are known in the art. Each such pattern generates data depicting a pattern to be recognized, and performs certain operations on this pattern in order to compare it to known patterns and thereby "recognize" the input pattern. A basic flow chart depicting a prior art pattern recognition system is shown in FIG. 1. A digitizer 12 converts an input pattern 11 which is to be recognized into digital data for storage in system memory 13. If input pattern 11 is basically a black and white figure, the stored digital data is typically binary in nature. Digitizers are well known in the art and typically are used in such devices as facsimile machines, electronic duplicating machines and optical character recognition systems of the prior art. Memory 13 can comprise any suitable memory device, including random access memories of well-known design.
Segmentation means 14 serves to divide the image data stored in memory 13 into individual characters. Such segmentation is known in the prior art, and is described, for example, in Digital Picture Processing, Second Edition, Volume 2, Rosenfeld and Kak, Academic Press, 1982, specifically, Chapter 10 entitled "Segmentation".
Feature extraction means 15 serves to transform each piece of data (i.e., each character) received from segmentation means 14 into a standard predefined form for use by classification means 16, which in turn identifies each character as one of a known set of characters.
Classification means 16 can be any one of a number of prior art identification means typically used in pattern recognition systems, including, more specifically, optical character recognition systems. One such suitable classification means is described in U.S. Pat. No. 4,259,661, issued Mar. 31, 1981 to Todd, entitled "Apparatus and Method for Recognizing a Pattern". Classification means 16 is also described in "Syntactic Pattern Recognition and Applications," K. S. Fu, Prentice Hall, Inc., 1982, specifically, Section 1.6, and Appendices A and B.
Postprocessing means 17 can be any one of a number of prior art postprocessing means typically used in pattern recognition systems, such as described in "n-Gram Statistics For Natural Language Understanding And Text Processing" Suen, IEEE Transactions on Pattern Analysis and Machine Intelligence," Vol. PAMI-1, No. 2, pp. 164-172, April 1979.
Output means 18 serves to provide data output (typically ASCII, or the like) to external circuitry (not shown).
One of the difficulties encountered in prior art pattern recognition systems has to do with distinguishing separate characters having positions which may be touching each other, or adjacent characters which are separate but which overlap each other vertically on the material on which they are printed or written. In this latter case, character separation techniques which scan vertically for black and white areas and rely on an absence of black areas for a predetermined number of the vertical scans as an indication of a space between characters can not detect the vertically overlapped characters as separate characters because there is no absence of black between them vertically.
Additional difficulties which are encountered in prior art pattern recognition systems are in the areas of recognizing "noise" regions in the data and in rejecting data representing pictures rather than data.