Mixed Raster Content (“MRC”) compression has been widely used to achieve smaller file size and good text quality. Typically, the scanned image is segmented into text components and image components. These different components are compressed using different techniques. For example, the text components are compressed using a lossless compression, whereas the image components are compressed using a lossy compression. By recording appropriate color information, colors of the text can be retained even in the binary layer. Only through MRC format can text be stored as binary with their color.
Often times, text is missed during the segmentation process, which missed text remains in image layer at high compression. This affects the quality of the text. Color text and text in color backgrounds are particularly challenging. Even when a user selects text mode scanning by a multifunction peripheral device (“MFP”), there is the possibility of losing text into the image layer. End users face this issue quite often and there are currently no options available at the MFP to address this shortcoming.
If a user wants to scan a text document and his/her interest is to have the text be in good quality with the colors of the texts retained and the file size to be optimum, the current option is to use an N-Layer MRC format. However, a major concern using this MRC format is segmentation error. If the segmentation fails to lift the text, then the missed text will be compressed with lossy compression within the image layer, resulting in a degradation of the quality of the missed text. Furthermore, it can be noticed in the MRC compressed files that certain text appears blurred and with low quality. This blurriness and low quality is due to the incorrect segmentation of the text. That is, these texts are identified as images and are highly compressed using JPEG (or other similar lossy compression methods), with the corresponding resolutions scaled down. It will be appreciated that this compression affects the quality of the text such that, among other issues, optical character recognition (“OCR”) based analysis on the file will not be reliable.
Accordingly, a reliable option for the user which can provide high text quality while optimizing file size is needed.