Optical character recognition (OCR) systems are used to transform images or representations of paper documents, for example document files in the Portable Document Format (PDF), into computer-readable and computer-editable and searchable electronic files. A typical OCR system consists of an imaging device that produces the image of a document and software that runs on a computer that processes the images. As a rule, this software includes an OCR program, which can recognize symbols, letters, characters, digits, and other units and save them into a computer-editable format—an encoded format.
However, apart from text, a document image may contain pictures, which lose their quality if saved together with the text using traditional methods. If lossless methods are used to save the pictures, the size of the resulting file typically becomes unacceptably large. To avoid this dilemma, a multilayer compression method is sometimes used and is known as the Mixed Raster Content (MRC) method. The MRC method uses three layers of compression so that one algorithm is used to compress the background, another algorithm is used to compress the chromatic units, and still another method may be used to compress the monochrome mask. This method will in most cases yield files of acceptable size. However, sometimes a user may need certain important elements, including, among other, pictures and photos, to be saved without any noticeable loss in quality.