1. Field of the Invention
The invention relates to methods and apparatus for compressing and decompressing digitally stored image information in cases where part of the image is invariant or standard, such as an image of a printed form, and thus does not contribute to the image information content.
2. Related Art
The digital processing of the information contained in documents generally involves the acquisition of the information by some reading device, the transformation of the acquired information into a machine readable code, the storing of the coded information for later, and possibly repeated processing, the actual processing of the information and finally the output of the results of the processing. This output may take visual form, as on a display unit or in print, or be purely electronic.
Generally the acquisition of the information by a reading device needs to be performed at a reasonably high resolution to avoid information loss and produces a high volume of scan data which requires a large memory capacity for storage. As a typical example a page of A4 size scanned at 100 pels/cm requires about 700 kbytes of storage space. To alleviate this problem document scanning systems are generally provided with some form of data compression capability to reduce the amount of storage required.
EP-A-0 411 231 discloses a compression/decompression scheme for scanned paper forms which achieves high compression ratios by removing template information common to all forms of the same type. The result of the compression of a form using this method is a compressed image consisting of the filled-in information only.
When reconstructing the form from its compressed form, the template data is superimposed on the image with the filled-in data to form the image of the original form. Such a method ensures that the image when encoded using conventional methods, such as run-end or run-length encoding, for storage or transmission will take up less space because the information content of the compressed form is reduced. This particular compression method has become known as `Form Drop-Out`.
One problem with the prior art form drop-out method is that it is not capable of extracting changes made by erasure of parts of the template. For example, when signing a standard legal contract, there are situations in which some of the template text must be replaced by manual fill-ins on areas which are applied by some sort of "white-out" material, such as a sticker or correction fluid. Moreover, if information is added in such areas, the subtraction process will fail to extract it. The conventional method does not reconstruct the form in these areas correctly.
The reason for this is that the template data is changed and the conventional form drop-out method does not account for changes which may occur to the template data itself, the main assumption upon which the conventional form drop-out method is based being that filled-in information can only be added on to a form, when actually, as in the case of "white-out", template information is also removed from a form.
The conventional form drop-out method also performs poorly when filled-in data appears on areas where the template data is dense e.g. areas where the background consists of some dense spatial pattern of dots or lines. For such areas, all filled-in data is in close proximity to template data, and form drop-out, which is essentially a localized process, usually removes the filled-in information totally.