Information in the form of language symbols (i.e., characters) or other symbolic notation that is visually represented to a human in an image on a marking medium, such as a computer display screen or paper, is capable of manipulation for its semantic content by a processor included in a computer system when the information is accessible to the processor in an encoded form, such as when each of the language symbols is available to the processor as a respective character code selected from a predetermined set of character codes (e.g. ASCII code) that represent the symbols to the processor. An image is typically represented in a computer system as a two-dimensional array of image data, with each item of data in the array providing a value indicating the color (typically black or white) of a respective location of the image. An image represented in this manner is frequently referred to as a bitmapped or binary image. Each location in a binary image is conventionally referred to as a picture element, or pixel. Sources of bitmapped images include images produced by scanning a paper form of a document using an optical scanner, or by receiving image data via facsimile transmission of a paper document. When manipulation of the semantic content of the characters in an image by a processor is desirable, a process variously called "recognition," or "character recognition," or "optical character recognition" must be performed on the image in order to produce, from the images of characters, a sequence of character codes that is capable of being manipulated by the processor.
Character recognition systems typically include a process in which the appearance of an isolated, input character image, or "glyph," is analyzed and, in a decision making process, classified as a distinct character in a predetermined set of characters. The term "glyph" refers to an image that represents a realized instance of a character. The classification analysis typically includes comparing characteristics of the isolated input glyph (e.g., its pixel content or other characteristics) to units of reference information about characters in the character set, each of which defines characteristics of the "ideal" visual representation of a character in its particular size, font and style, as it would appear in an image if there were no noise or distortion introduced by the image creation process. The unit of reference information for each character, typically called a "character template," "template" or "prototype," includes identification information, referred to as a "character label," that uniquely identifies the character as one of the characters in the character set. The character label may also include such information as the character's font, point size and style. A character label is output as the identification of the input glyph when the classification analysis determines that a sufficient match between the glyph and the reference information indicating the character label has been made.
The representation of the reference information that comprises a character template may be referred to as its model. Character template models are broadly identifiable as being either bitmapped, or binary, images of characters, or lists of high level "features" of binary character images. "Features" are measurements of a character image that are derived from the binary image and are typically much fewer in number than the number of pixels in the character image. Examples of features include a character's height and width, and the number of closed loops in the character. Within the category of binary character template models, at least two different types of models have been defined: one model may be called the "segmentation-based" model, and describes a character template as fitting entirely within a rectangular region, referred to as a "bounding box," and describes the combining of adjacent character templates as being "disjoint"--that is, requiring nonoverlapping bounding boxes. U.S. Pat. No. 5,321,773 discloses another binary character template model that is based on the sidebearing model of letterform shape description and positioning used in the field of digital typography. The sidebearing model, described in more detail below in the discussion accompanying FIG. 1, describes the combining of templates to permit overlapping rectangular bounding boxes as long as the foreground (e.g., typically black) pixels of one template are not shared with, or common with, the foreground pixels of an adjacent template; this is described as requiring the templates to have substantially "disjoint supports."