The present invention relates to the automatic processing of characters or other images, and more specifically concerns apparatus and methods for locating an individual image and separating it from surrounding images so that the individual image can be processed as a single entity.
Many would agree that the actual recognition of a character or other image is the easy part. The hard part is to turn smudged, skewed, run-together, misaligned patches of ink on a document into well-defined precisely located rectangles which can be said to contain one character to be analyzed. In the words of Charles F. Kettering, "A problem well stated is a problem half solved.". So too, a well-isolated character is already partly recognized.
A large number of approaches have been put forth to locate and separate individual characters on a document. Some are ingenious, and some work quite well. Most of the previous methods involve the determination of a white (i.e., background color) path between every pair of characters: a blank vertical strip, a serpentine boundary, and so forth. This approach is usually called "segmentation". Another group of methods, called "blocking", attempts to isolate image blocks which are easily separable but may contain multiple characters. The blocks are then reduced to individual characters by other means, such as by division into equal-width increments according to a greatest common factor of their varying sizes. All of the previous methods, however, have their drawbacks. Most are quite sensitive to the nature of the particular characters or patterns to be recognized; a good segmentation algorithm for one font may not work for another. Blocking algorithms can be defeated by proportional-spaced characters.