The present disclosure relates to data recovery, more specifically, to a system and method for optimizing data recovery for a partially destroyed document.
Printed or paper documents can be destroyed physically, through fire, cutting, tearing and other physical insults. Pages or whole sections of a printed document may be destroyed by such physical insults. Often only part of the printed document is destroyed, since the physical insult is localized to several pages or sections of the document. Once pages or regions within pages are destroyed, the information that was contained by these pages or regions is difficult to recover accurately. Existing solutions rely on storage of redundant information within the printed document, typically but not necessarily in digital form, using encoding schemes, to allow a degree of data recovery, depending critically on the extent of the damage. In such encoding schemes, the ability to recover data is proportional to the amount of redundancy built-in to the printed document.
There are several existing schemes for encoding document recovery information within a printed document to enable future recovery of the document. For instance, one may use Xerox™ DataGlyphs™, or some other two-dimensional digital coding scheme; there are also well-known steganographic techniques for concealing some representation of the document within itself. A highly desirable feature of such encoding techniques is the ability to encode document recovery information without extending the length of the original document. Another desirable feature is the ability to blend the encoded document recovery information into the original document without disturbing the document noticeably. Hence, existing encoding schemes are typically more, or less, transparent to the user of the document.
Data recovery encoding schemes may also vary in the faithfulness of their ability to recover the document; some schemes may only recover a rough representation of the original to assist forensic inspection of the document, while other schemes may recover high fidelity reproductions of the originals. One way in which a latter scheme encodes document recovery information is by optically rasterizing the pages of the document to produce a digital image of each page and then encoding the rasterized images in compressed form within the document. The scheme might use lossy image compression to achieve high compression ratios; the degree of lossiness being dictated by the fidelity requirement imposed on the recovered page image.