The present invention relates to extraction of character information for character recognition, and is particularly directed to a method of extracting relevant character information from fields of gray scale image data obtained from scanning a bank document such as a check.
In the banking industry, a check may contain machine-printed or handwritten data in the "courtesy" amount field of the check. In a continuing trend to automate banking operations, efforts have been made to machine read the courtesy amount of the check. The machine-printed or handwritten courtesy amount needs to be extracted prior to being subjected to character recognition at a subsequent operation.
To extract the relevant character information from the courtesy amount field of the check, the check is typically moved past a scanning device to obtain a digitized image of the check. More specifically, as the check moves past the scanner device, the scanner device generates successive scan lines of pixels to produce a matrix of pixels associated with the check. Each pixel may have a particular gray level associated therewith. For example, a pixel may have any one of 256 gray levels associated therewith, ranging from completely black (level zero) to completely white (level 255).
A number of character extraction schemes is known. One character extraction scheme includes a binarization method which attempts to de-emphasize unimportant pixels (noise for example). It is important to use a quality extraction scheme when a high performance optical character recognition (OCR ) system is used to further process the check. If a character in the courtesy amount field is not extracted properly, the character recognition engine which later acts on the extracted character information may perform poorly.