1. Field of the Invention
The present invention generally relates to optical character recognition (OCR) and more particularly to a fast method and apparatus for converting a gray image to a binary image (binarization).
2. Background Description
In the present application, binarization is discussed in the context of mail sorting, but the technique is not limited thereto. A number of techniques may be combined to derive intelligence from an image. Binarization is one of them, and it may be used by itself or in conjunction with other techniques.
In systems of the prior art, an optical character recognition (OCR) camera produces an image of the address on each piece of mail. OCR cameras produce gray scale images. Such images include address information of possibly varying intensity. Artifacts may also be present in the image due to such factors as smudges, glare, uneven illumination, paper texture or other matter not representative of address information.
In the systems of the prior art, the binarization process is typically performed in special purpose hardware. For instance, a piece of mail is effectively segmented into columns, where each column is imaged in succession. Generally, a limited number of columns are kept in memory on a FIFO (First In First Out) basis. This vertical slice of data provides the information which is used to make binarization decisions for individual pixels. The resulting binarized data is then streamed out to interface hardware (such as frame grabbers or custom hardware) that provides access to the binarized image for the OCR function. While these hardware binarization systems are fast, they are not easily modifiable.
Existing software binarization systems can store an entire image in memory at the same time, but each pixel is typically accessed individually. Therefore, comparisons of tiles and adjacent pixels is time consuming because of access delays. High speed is an important aspect of present day systems.
There are systems in other fields which use binarization, as well. For instance, many document copiers use binarization techniques. Some copiers use histograms to determine black/white thresholds. Histograms are generated by sampling the pixels of either subareas or the entire image and collecting the frequency of occurrence of each gray level. Various methods are then applied to the histogram to determine where to place the black/white threshold.
One well known method of using a histogram of pixel values to determine a black/white threshold is by N. Ohtsu, xe2x80x9cMethod of Determining Threshold Value from Tone Distribution,xe2x80x9d Article 145, National Conference of Information Group of the Electronic Communication Society (1977). The Ohtsu method assumes that there are two populations of gray levels which correspond to the background (for example, an envelope or piece of paper) and the foreground (the text). This knowledge is then used to find the best threshold to distinguish between them. The threshold is then used to binarize the subject area. This method is not good for mail sorting and address recognition because the basic assumption is often wrong. There may be several different populations of gray values in an image due to different ink colors used in different textual areas, stamps, graphical areas, and others. Even if subareas are examined separately, issues such as security (anti-fraud) backgrounds, illumination irregularities, and others can cause a simple threshold to be inadequate.
One improvement on the Ohtsu method that is known in the art is to take small tiles and then give thresholds for each tile. Several techniques include xe2x80x9ctiling.xe2x80x9d Tiling is the division of an image into a number of smaller rectangles and using data within each tile. One prior technique creates small tiles, e.g. 16 pixelsxc3x9716 pixels, samples the pixels within the tile, and generates statistics from which a binarization threshold is calculated. In addition, the statistics can be used to determine that certain tiles contain no information and therefore do not need to be binarized at all, which thereby reduces processing time. Making a decision to not binarize a tile carries with it potential of losing information if the decision is not correct. Therefore, the binarization decisions from surrounding tiles can be taken into account before finalizing the decision.
This method, although it does not share Ohtsu""s assumption of two populations, still suffers from the problem of having to divide the tile into two populations by the very nature of picking a black/white threshold. Any tiled threshold approach also has the problem of possible discontinuities generated at the junctions of tiles, due to the potentially different thresholds used by each tile.
The prior techniques do not emphasize techniques for deriving certain forms of processing for the tile information. The histograms may be used for threshold determination. The histograms are not used in the prior art in conjunction with edge detection. Also, the prior art does not recognize the need for real time processing in a high-speed system.
Another binarization method which is well known in the prior art is edge detection. This method consists of Laplacian edge enhancement, in conjunction with a thresholding of the resulting image. This method solves the problems of Ohtsu and tile based approaches since there are not population assumptions or tile edges. However, it introduces several problems of its own. Any noise in the gray image tends to be transferred to the binary image, and mail piece images tend to have noise due to things such as envelope texture and camera digitization issues. Also, since by definition the method is detecting edges, thick objects in the gray image will show up in the binary image as outlined objects.
As described above, the prior art provides fast methods of binarization (hardware systems) and easily modifiable methods (software systems). However, there is no present method that is fast, easily modifiable and of high quality. In the field of mail sorting, there is a growing need to speed the binarization process in order to meet the ever increasing demands of the postal system. It is also important to produce high quality results to avoid mis-sorting of mail.
It is therefore a general object of the present invention to provide a method and apparatus in which gray images of a mail piece or the like are binarized using a general purpose processor at a rate on the order of magnitude of at least 30,000 images an hour.
It is a more specific object of the present invention to provide a method and apparatus for binarizing images from an optical character recognition (OCR) camera utilizing a combination of tile based need for binarization determination, tile based background threshold determination, and an edge detection algorithm that provides good results for typical sized characters on mail piece images.
It is a further object of the present invention to utilize a method and apparatus of the type described, further utilizing an edge detection method which may be embodied in software that is sufficiently fast to provide real time processing in the context of high volume processing.
According to the invention, there are provided a method and apparatus for image binarization suitable for subsequent OCR/ICR processing, and a method for binarizing suitable for embodying in software and providing real time processing in a high volume, high speed application. An image from an OCR camera is resolved into tiles. The tiles are small, to provide detailed processing of the image and large enough so that the necessary information can be derived from each tile. In a preferred form, the tiles are each 16 pixelsxc3x9716 pixels. Operation is as follows:
A. The image is tiled. The method and apparatus collect for each tile:
1. variance of intensity
2. 32-level histogram containing frequency of occurrence of intensity values divided by 8. Note that in an 8-bit gray image there are 256 intensity levels, which when divided by 8 (8 value wide bins) produces a 32-level histogram.
B. Using statistics obtained in A(1), an initial decision is made on which tiles to binarize. The final decision on which tiles to binarize is made by examining each tile""s neighborhood. If an area is background only, it does not need to be binarized. Consequently speed of the process is improved.
C. Using the statistic from A(2), determine a background threshold for each tile (the intensity above which background is indicated). Note that black is 0 and white is 255.
D. Binarize each of the pixels within the tiles indicated by step B. Use the background thresholds determined in step C, and apply a 5 by 5 morphological transform that combines the following attributes:
1. averaging to reduce noise, utilizing pixels one unit distant east, west, north and south (right, left, above, below);
2. performing Laplacian derived edge detection using pixels 2 units distant east, west, north and south; this edge detection is immune to common line scan camera even/odd channel irregularities; using a distance of 2 pixels allows the edge routine to xe2x80x9creachxe2x80x9d into the center of normal size characters and thereby reduce and/or eliminate the tendency to xe2x80x9coutlinexe2x80x9d the characters;
3. improving speed by using 9 pixels at a time out of the 25 pixels in a 5xc3x975 array, and using weights so that multiply or divide operations can be accomplished by shifts; and
4. modifying the transform output by using the background threshold to reject black pixels in areas that are determined to be background areas. In this matter, noise is eliminated.
Other sampling patterns are possible. More samples allow better statistics. Fewer samples require less processing. The pattern chosen is a compromise between conflicting goals, as well as using knowledge of the likely sizes, shapes, and orientations of the data in the image.