1. Technical Field
Embodiments of the present application generally relate to document compression and, in particular, a method and apparatus for compressing a document using pixel variation information.
2. Description of the Related Art
The rapid proliferation of multimedia content (i.e., user interactive controls and application generated controls that create an exciting and interesting multimedia experience) throughout the Internet was caused by numerous technological innovations. Users spend a significant amount of time conducting various activities related to multimedia content (e.g., surfing educational websites, viewing detailed product demonstrations, accessing digital libraries and/or the like). These users often generate and/or view multimedia content on various display devices (e.g., a mobile phone, a scanner, an electronic book reader, a Personal Digital Assistant (PDA), a hand-held gaming device and/or the like).
Various types of multimedia content, such as image and text data, may be stored in a document, such as a Portable Document Format (PDF) file. PDF is an open standard for document exchange created by Adobe Systems of San Jose, Calif. Often, the document becomes too large in size for efficient data transmission to another storage area. Such a document is compressed into a document image that is smaller in size and thus, easier to transfer as a file. A well- known compression process is known as Mixed Raster Content (MRC) based document compression where the document is decomposed into three layers: a foreground layer, a background layer and a mask layer. The mask layer (also referred to herein as simply a mask) is a binary image in which each pixel value dictates whether color of a corresponding pixel in the compressed document will be retrieved from the foreground layer or the background layer. MRC compression is typically implemented in a scanner (e.g., a document and/or image scanner), which is a device that creates an electronic version of a paper document.
MRC document compression is used to achieve higher compression ratios when scanning document while retaining textual data clarity. However, one of the main challenges to implementing MRC document compression is creating an appropriate and accurate mask. Having such a mask renders creating the foreground and the background layers to be relatively simple tasks. Conventional methods of mask creation apply a binarization process (e.g., NiBlack binarization) on a grayscale image. Sometimes, MRC document compression does not produce an accurate mask when certain conditions are present, such as light colored text on a light background, dark colored text on a dark background, reverse text (e.g., light text on a dark background), inclusion of image regions in the mask and noise.
Therefore, there is a need in the art for a method and apparatus for compressing a document using pixel variation information to create the accurate mask.