A wide variety of pattern recognition systems, including character recognition systems, are known in the art. Each such system receives data, often from an optical device, depicting a pattern to be recognized, and performs certain tasks on this pattern in order to compare it to known patterns in order to "recognize" the input pattern. A flow chart depicting the operation of a typical pattern recognition system is shown in FIG. 1. The input pattern is the pattern which is desired to be recognized. Digitizer 12 converts input pattern 11 to a series of bytes for storage in system memory 13. These bytes are typically binary in nature, reflecting the fact that input pattern 11 is basically a black and white figure. Digitizers are well known in the art and typically are used in such devices as facsimile machines, electronic duplicating machines (as opposed to optical photocopy machines) and optical character recognition systems of the prior art. Memory 13 can comprise any suitable memory device, including random access memories of well-known design. Segmentation 14 serves to divide the image data stored in memory 13 into individual characters. Such segmentation is known in the prior art, and is described, for example, in Digital Picture Processing, Second Edition, Volume 2, Azriel Rosenfeld and Avinash C. Kak, Academic Press, 1982, specifically, Chapter 10 entitled "Segmentation".
Feature extraction 15 serves to transform each piece of data (i.e., ideally each character) received from segmentation 14 in order to transform that data into a standard predefined form for use by identification means 16, which in turn identifies each character as corresponding to one of a known set of characters. Output means 17 serves to provide data output (typically ASCII, or the like) to external circuitry (not shown), as desired.
Identification means 16 can be any one of a number of prior art identification means typically used in pattern recognition systems, including, more specifically, optical character recognition systems. One such identification means suitable for use in accordance with the teachings of this invention is described in U.S. Pat. No. 4,259,661, issued Mar. 31, 1981 to Todd, entitled "Apparatus and Method for Recognizing a Pattern". Identification means 16 is also described in Syntactic Pattern Recognition and Applications, K. S. Fu, Prentice Hall, Inc., 1982, specifically, Section 1.6, and Appendices A and B.
Inasmuch as this invention pertains to a method for use within segmentation means 14 for use in an optical character recognition system, this patent application, including the description of prior art herein, will focus on segmentation means 14, although it is to be understood that the teachings of this invention are equally applicable for use in pattern recognition systems for recognizing patterns other than characters, or indeed even to systems used to recognize any information capable of being represented mathematically.
Segmentation means 14 serves to separate individual patterns or characters from each other in order that each may be recognized individually. In prior art optical character recognition systems wherein the text being read is known to have a fixed pitch (i.e., the lateral distance between centers of adjacent characters is a constant), if adjacent characters happen to be joined, the joined characters can be separated with some degree of accuracy using rather rigid techniques based upon the constant width of each character. Such prior art systems have the disadvantage that the two or more joined characters often are not separated at the precise point of joining, leaving off a portion of one character and attaching it to the adjacent. This makes recognition of either of those two characters very difficult, if not impossible.