(1) Field of the Invention. This invention relates to a process for finding and segmenting pixel data associated with a character from a matrix of pixel data in a multifont optical image system so as to enable the pixel data associated with a character to be singled out for use in subsequent operations, like character recognition techniques, for example.
(2) Description of Related Art. In recent years, there has been a trend to generate images of documents, and to use the images of the documents, where possible, in processing the information about the documents. For example, documents, like checks and deposit slips, may be imaged by moving the documents past a scanner which scans each document and produces a matrix of "pixel" data about each document. A pixel or pel is defined as a picture element which corresponds to a small area of the document being scanned. For example, there may be about 600 or 900 pixels in each scan line or column generated by the scanner. As the document is moved past the scanner during imaging, the scanner generates successive scan lines of pixels to produce a matrix of pixels for each document.
The matrix of pixels from the scanner is processed by thresholding, for example, to reduce each pixel to a binary "1" or a binary "0", with the binary 1 representing the presence of data and a binary 0 representing the absence of data. By this technique, a matrix of pixels is obtained for each document, with the matrix of pixels corresponding to the image of the document. The matrix of pixels associated with a document may be stored in a RAM or displayed on a CRT, for example, to be viewed by an operator when performing data completion in a financial environment, for example.
The matrix of pixels associated with a document contains image data about that document as previously mentioned. When the documents being processed are financial documents, like checks, for example, there are certain fields on the checks which are read by machines. The fields to be read contain character data which is printed in certain fonts, like El3B and CMC7, for example. With a resolution of about 200 pixels per inch at the scan line, for example, it is possible to machine read the characters in the fields by optical character recognition techniques when using the matrix of pixels.
A problem with working with a matrix of pixels is that it is generally difficult to find the fields containing the characters to be read, especially when the fields may be located in different places or areas on the documents from which the image data was obtained. Another problem is that after the field containing the characters is found, it is necessary to segment the matrix of pixels in that particular field in order to separate the pixels associated with one character from the remaining characters in the field. As the pixels associated with each character are segmented from the associated field, they may be subjected to character recognition techniques. Such techniques may include, for example, back propagation neural networks or other networks which may be used for character recognition.