The present invention relates to techniques that obtain information about the dominant typeface of text in an image.
Chow, U.S. Pat. No. 3,634,822, describes character recognition techniques that identify characters in each of three different fonts. As shown and described in relation to FIG. 1, each character is scanned to obtain a binary word representation of the character. This representation is applied to three tables storing probability representations for each known character in the three fonts. Character comparison functions for each character in each font are produced and applied to three accumulators, shown in FIG. 2D, to provide three font comparison functions for the unknown character. From these functions the font is determined without identifying the character. From the results of font identification, font frequency functions are derived for modifying the character comparison functions, which are then compared to identify the unknown character. Brickman et al., U.S. Pat. No. 4,499,499, describe techniques for identifying and compacting text data to be transmitted over communication lines. As shown and described in relation to FIG. 5, character matching is performed to identify characters in order to identify the font of the input data. The matching includes a preliminary screening, based on ascender, descender, width, and height, and a template match that produces a total correlation value, enhanced by sensitive feature data. As described beginning at col. 17 line 34, font match statistics indicate the frequency with which prestored fonts contribute to character matching. These statistics are used to determine if the font is known or which fonts are still candidates. If enough characters in a given word match a given font, it is presumed that the words on a page are printed in that font.
Brickman, N. F. and Rosenbaum, W. S., "Word Autocorrelation Redundancy Match (WARM) Technology," IBM Journal of Research and Development, Vol. 26, No. 6, November 1982, pp. 681-686, apparently describing the same techniques as Brickman et al., U.S. Pat. No. 4,499,499, indicates at page 685 that font detection statistics have consistently shown very peaked response characteristics, demonstrating rapid and accurate font discrimination. Typical results give 90 percent character match in only the single correct font and less than 0.1 percent correct character match in an incorrect font.
Grabowski et al., U.S. Pat. No. 4,468,809, describe multiple font optical character recognition (OCR) techniques. As shown and described in relation to FIGS. 1, 8, 14, 50A, 50B, and 51, the operator can individually designate the font to be recognized in each of two fields, and templates for comparison are provided in response. In a remittance processing (RPS) mode described beginning at col. 13 line 43, the OCR technique automatically reads any one of four fonts depending on which of three special symbols are encountered in the first 25 millimeters of the document, reading the fourth if no special symbol is detected.
Suzuki et al., U.S. Pat. No. 4,933,979, describe techniques for reading data from a form sheet. As shown and described in relation to FIG. 4B, a multi-font mode can be designated in which it is possible to automatically discriminate the writing style if it is one of six predetermined kinds that are frequently used. A dictionary can be selected depending on the discriminated writing style, and character recognition can be carried out based on the automatically selected kind of dictionary.
Umeda et al., EP-A-288 266, describe techniques for discriminating between handwritten and machine-printed characters. As described at page 2 lines 7-15, similarity of machine-printed characters is determined from similarity of overall shape, while similarity of handwritten characters is determined on the basis of features such as horizontal, vertical, or slanted lines. For effective recognition, two optical character readers (OCRs) are provided for the respective recognition techniques. Errors result from processing handwritten characters with the OCR for machine-printed characters, but this could be avoided if it is known before recognition with OCRs whether the characters are machine-printed or handwritten. As shown and described in relation to FIG. 2, the discrimination is made by detecting that the occurrence ratio of the slanted stroke component to the total of horizontal, vertical, and slanted stroke components is above an experimentally determined level.