Documents scanned at high resolutions require very large amounts of storage space. Furthermore, large volume of image data requires substantially more time and bandwidth to move around, or over networks. Instead of being stored in raw scanned RGB format, the data is typically subjected to some form of data compression in order to reduce its volume, and thereby avoid the high costs of storage it. “Lossless” compression methods such as Lempel-Ziv (LZ) do not perform particularly well on scanned (noisy) pixel data. While “lossy” methods such as JPEG work fairly well on continuous-tone pixel maps, they do not work particularly well on the parts of the page containing text and line art. To optimize image data compression, techniques, which can recognize the type of data being compressed, are needed.
One approach to satisfy the compression needs of differing types of data has been to use an encoder pipeline utilizing a Mixed Raster Content (MRC) format to describe the image. The image—a composite image having text intermingled with color or gray scale information—is segmented into two or more planes, generally referred to as the upper and lower plane, and a selector plane is generated to indicate, for each pixel, which of the image planes contains the actual image data that should be used to reconstruct the final output image. Segmenting the planes in this manner can improve the compression of the image because the data can be arranged such that the planes are smoother and more compressible than the original image. Segmentation also allows different compression methods to be applied to the different planes. Thus, the most compression technique for the type of the data in each plane can be applied.
A Statistics (Stats) Module responsible for collecting essential statistics about the image content is desirable. It would monitor the incoming pixel data stream and accumulate the distribution of pixel values and their color information. The Stats Module would operate in a luminance-chrominance color space domain, such as Lab or YCC. For this reason, it would be desirable to locate the Stats Module after a Scanner Color Conversion unit, which transforms the scanner data from RGB into the Lab color space.