Different methodologies are used for performing optical character recognition (OCR) on handwritten text and machine-printed text. To maximize the accuracy of an OCR, it is advisable to separate handwritten text from machine-printed text before having the same processed by an OCR that accepts the text type to be processed.
U. Pal and B. B. Chaudhuri, in an article entitled “Automatic Separation of Machine-Printed and Hand-Written Text Lines,” in Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999, pages 645–648, disclose a method of separating machine-printed and handwritten text in both Bangla (Bangla script) and Devnagari (Hindi script) based on the distinctive structural and statistical features of machine-printed and handwritten text lines. The present invention is not based on structural and statistical features of the entire lines of machine-printed and handwritten text.
Sean Violante et al., in an article entitled “A COMPUTATIONALLY EFFICIENT TECHNIQUE FOR DISCRIMINATING BETWEEN HAND-WRITTEN AND PRINTED TEXT,” in IEE Colloquium on Document Image Processing and Multimedia Environments, 1995, pages 17/1–17/7, dislose a method of distinguishing handwritten versus machine-printed addresses on mail by determining region count, edge straightness, horizontal profile, and the dimensions of the address box and then using a neural network to classify the letter as having either a handwritten or machine-printed address. The present invention does not use all of the features Violante et al. use to determine whether or not text is handwritten or machine-printed.
K. Kuhnke et al., in an article entitled “A System for Machine-Written and Hand-Written Character Distinction,” in Proceedings of the Third International Conference on Document Analysis and Recognition, 1995, pages 811–814, disclose a method of distinguishing handwritten text from machine-printed text by preprocessing the image by using a bounding box and extracting contours, extracting features from the image (i.e., straightness of vertical lines, straightness of horizontal lines, and symmetry relative to the center of gravity of the character in question). The features extracted by Kuhnke et al. are not used in the present invention.
Kuo-Chin Fan et al., in an article entitled “CLASSIFICATION OF MACHINE-PRINTED AND HANDWRITTEN TEXTS USING CHARACTER BLOCK LAYOUT VARIANCE,” in Pattern Recognition, 1998, Vol. 31, No. 9, pages 1275–1284, disclose a method of distinguishing handwritten text from machine-printed text by dividing text blocks into horizontal or vertical directions, obtaining the base blocks from a text block image using a reduced X-Y cut algorithm, determining character block layout variance, and classifying the text according to the variance. The variance determined by Fan et al. is not used in the present invention.
U.S. Pat. No. 4,910,787, entitled “DISCRIMINATOR BETWEEN HANDWRITTEN AND MACHINE-PRINTED CHARACTERS,” discloses a device for and method of distinguishing between handwritten and machine-printed text by determining the total number of horizontal, vertical, and slanted strokes in the text, determining the ratio of slanted strokes to the determined total, and declaring the text handwritten if the ratio is above 0.2 and machine-printed if the ratio is below 0.2. The present invention does not distinguish between handwritten and machine-printed text based on a ratio of slanted strokes in the text to a total of horizontal, vertical, and slanted strokes in the text. U.S. Pat. No. 4,910,787 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 5,442,715, entitled “METHOD AND APPARATUS FOR CURSIVE SCRIPT RECOGNITION,” discloses a device for and method of recognizing cursive script by segmenting words into individual characters, scanning the individual characters using a window, and determining whether or not a character within the window is in a cursive script using a neural network. The present invention does not use a scanning window or a neural network to distinguish between handwritten and machine-printed text. U.S. Pat. No. 5,442,715 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 6,259,812, entitled “KEY CHARACTER EXTRACTION AND LEXICON REDUCTION CURSIVE TEXT RECOGNITION,” discloses a device for and method of recognizing cursive text by calculating character and geometric confidence levels to identify “key characters.” The present invention does not calculate character and geometric confidence levels to identify “key characters.” U.S. Pat. No. 6,259,812 is hereby incorporated by reference into the specification of the present invention.
U.S. Pat. No. 6,259,814, entitled “IMAGE RECOGNITION THROUGH LOCALIZED INTERPRETATION,” discloses a device for and method of recognizing machine-printed and handwritten characters images by creating a look-up table with examples of machine-printed and handwritten characters and comparing an unknown character to the look-up table to determine its type. The present invention does not use a look-up table filled with examples of machine-printed and handwritten characters. U.S. Pat. No. 6,259,814 is hereby incorporated by reference into the specification of the present invention.