The present invention is directed to systems and methods for creating documents containing encoded data and human-readable data, and more particularly, to devices and methods for encoding and decoding documents containing machine-readable text overlaid with human-readable content and graphics.
Modern electronic document processing systems generally include input scanners for electronically capturing the general appearance (i.e., the human readable information content and the basic graphical layout) of human readable hardcopy documents; programmed computers for enabling users to create, edit and otherwise manipulate electronic documents; and printers for producing hardcopy, human readable renderings of electronic documents. These systems typically have convenient access to mass memory for the storage and retrieval of electronic document files. Moreover, they often are networked by local area networks (LANs), switched data links, and the like for facilitating the interchange of digital electronic documents and for providing multi-user access to shared system resources, such as high speed electronic printers and electronic file servers.
The technical details pertaining to the interchangeability of electronic documents are beyond the scope of this invention, but it should be understood that there is not yet an xe2x80x9cuniversal interchange standardxe2x80x9d for losslessly interchanging xe2x80x9cstructured electronic documentsxe2x80x9d (i.e., documents conforming to predefined rules governing their constituent elements, the characteristics of those elements, and the interrelationships among their elements). Plain text ASCII encoding is becoming a de facto interchange standard, but it is of limited utility for representing structured electronic documents. Other encoding formats provide fuller structural representations of electronic documents, but they usually are relatively system specific. For example, some of the more basic; document description languages (DDLs) employ embedded control codes for supplementing ASCII encodings with variables defining the logical structure (i.e., the sections, paragraphs, sentences, figures, figure captions, etc.) of electronic documents, thereby permitting such documents to be formatted in accordance with selected formatting variables, such as selected font styles, font sizes, line and paragraph spacings, margins, indentations, header and footer locations, and columns. Graphical DDL encodings provide more sophisticated and complete representations of electronic document structures because they encode both the logical structure and the layout structure of such documents. Page description language (PDL) encodings are related to graphical DDL encodings, but they are designed so that they can be readily decomposed or interpreted to define the detailed layout of the printed page in a raster scan format. Accordingly, it will be appreciated that the transportability of electronic documents from one document processing system to another depends upon the ability of the receiving or xe2x80x9ctargetxe2x80x9d system to interpret, either directly or through the use of a format converter, the encoding format in which the document is provided by the originating or xe2x80x9csourcexe2x80x9d system. To simplify this disclosure, source/target encoding format compatibility will be assumed, but it should be clearly understood that this is a simplifying assumption.
It is undisputed that a digital representation of a document may be easily converted into a human-readable version of the document and vice versa. It is therefore logical that a digital representation of a document may be easily transported from one computer system to another. In the past, others have proposed printing digital data, including electronic document files, on a recording medium, such as plain paper, so that optical readers can be employed for uploading the data into electronic document processing systems. See, for example, commonly assigned, U.S. Pat. No. 5,486,686, entitled, xe2x80x9cHardcopy Lossless Data Storage and Communication for Electronic Document Processing Systemsxe2x80x9d to Zdybel et al. In that system, machine-readable codes are printed at various locations on a hardcopy document and human-readable content is printed at separate locations on the same hardcopy document. The integration of the machine-readable digital representations of electronic documents with the human readable hardcopy renderings may be employed to enhance the detail with which the structure and content of the electronic document can be recovered, and also for enabling recipients of scanned-in versions of such documents to identify and process annotations that were added to the hardcopies after they were printed and/or for alerting the recipients of the scanned-in documents to alternations that may have been made to the human-readable content of the hardcopy renderings. However, that approach has drawbacks in that the amount of machine-readable information that may be stored on the page is more limited since the areas for storing machine-readable information may not overlap with the human-readable information. That approach also fails to disclose the capability to identify minor differences (i.e., those that may exist on later generations of a document that has been repeatedly copied, or when an inexact duplicate of a document has been created).
Consequently, it would be a significant improvement if the ordinary hardcopy output of electronic document processing systems could be employed as a system for storing more complete versions of the original document in digital format. It would also be a significant improvement if the hardcopy output of electronic document processing systems could be used to identify copies of an original document and for ensuring the integrity of copies of an original document. Thus, there is a need to overcome these and other problems of the prior art and to provide an efficient method for storing the original or xe2x80x9creferencexe2x80x9d image of a document with the current version of the document.
In accordance with the present invention, a method for producing a composite machine-readable and human-readable document in which the machine readable code provides information for comparing the human readable content to reliable references defining nominal position and content elements of the human readable portion. The method comprises the steps of generating a background image on a substrate, wherein the background image comprises coded glyph marks based on grayscale image data values. Next, the background image is overlaid with a second image such that some of the visible glyph marks may be decoded and the second image may be viewed. Finally, an image of the document may be captured and the machine readable glyph marks detected and decoded to retrieve unambiguously referenced spatial position and content information enabling comparison of aspects of the capture image with a predetermined nominal reference.