The invention relates to computing systems and more particularly, to methods and apparatus for recognizing a background in a multicolor image.
Text recognition techniques, such as optical character recognition (OCR), can identify text characters or objects in an image (the “original image”) stored as a pixelmap in a computer and convert the text into corresponding ASCII characters. An OCR program can differentiate between text objects and non-text objects (such as the background) in an image based on intensity differences between the text objects and the background. This can be accomplished when the text characters and the background are two distinct colors.
However, the task of recognizing text in a multicolor image is more difficult. For example, an image may include text characters, background, and non-text characters, such as graphical objects, having different colors. Furthermore, different blocks of text in the image may have different combinations of colors. For example, one text block may have red text against a white background and another text block may have yellow text against a black background.
In addition to text recognition problems, multicolor images present an additional problem when attempting to reproduce the original image. Conventional OCR programs extract text from a pixelmap and the remaining information is typically represented as a colored rectangle. Thus, a cyan page with black text would conventionally be reproduced as a cyan rectangle with black text rendered on top of the rectangle. The reason for this is the extraction of the text may result in a text alignment for the rendered text that does not exactly align with the original pixelmap. As such, to ensure no gaps are produced in the final rendered image, the reproduction of a pixelmap after OCR is typically limited to simple background rectangles. When operating on a multicolor image, conventional OCR programs typically reproduce the text over a colored rectangle without regard for gradients or patterns found in the background portion of the original image.