Modern electronic document processing systems generally include input scanners for electronically capturing the general appearance (i.e., the human readable information content and the basic graphical layout) of human readable hardcopy documents; programmed computers for enabling users to create, edit and otherwise manipulate electronic documents; and printers for producing hardcopy, human readable renderings of electronic documents. These systems typically have convenient access to mass memory for the starage and retrieval of electronic document files. Moreover, they often are networked by local area networks (LANs), switched data links, and the like for facilitating the interchange of digital electronic documents and for providing multi-user access to shared system resources, such as high speed electronic printers and electronic file servers.
The technical details pertaining to the interchangeability of electronic documents are beyond the scope of this invention, but it should be understood that there is not yet an "universal interchange standard" for losslessly interchanging "structured electronic documents" (i.e., documents conforming to predefined rules governing their constituent elements, the characteristics of those elements, and the interrelationships among their elements). Plain text ASCII encoding is becoming a de facto interchange standard, but it is of limited utility for representing structured electronic documents. Other encoding formats provide fuller structural representations of electronic documents, but they usually are relatively system specific. For example, some of the more basic document description languages (DDLs) employ embedded control codes for supplementing ASCII encodings with variables defining the logical structure (i.e., the sections, paragraphs, sentences, figures, figure captions, etc.) of electronic documents, thereby permitting such documents to be formatted in accordance with selected formatting variables, such as selected font styles, font sizes, line and paragraph spacings, margins, indentations, header and footer locations, and columns. Graphical DDL encodings provide more sophisticated and complete representations of electronic document structures because they encode both the logical structure and the layout structure of such documents. Page description language (PDL) encodings are related to graphical DDL encodings, but they are designed so that they can be readily decomposed or interpreted to define the detailed layout of the printed page in a raster scan format. Accordingly, it will be appreciated that the transportability of electronic documents from one document processing system to another depends upon the ability of the receiving or "target" system to interpret, either directly or through the use of a format converter, the encoding format in which the document is provided by the originating or "source" system. To simplify this disclosure, source/target encoding format compatibility will be assumed, but it should be clearly understood that this is a simplifying assumption.
Others previously have proposed printing digital data, including electronic document files, on a recording medium, such as plain paper, so that optical readers can be employed for uploading the data into electronic document processing systems. See, for example, Brass et al U.S. Pat. No. 4,754,127, which issued Jun. 28, 1988 on "Method and Apparatus for Transforming Digitally Encoded Data into Printed Data Strips," and Brass et al U.S. Pat. No. 4,782,221, which issued Nov. 1, 1988 on "Printed Data Strip Including Bit-Encoded Information and Scanner Control." In view of the additional insights provided by the user documentation for "The Laser Archivist," Cauzin Systems, Inc., 1987, it is believed that the so-called "data strips" this prior work has provided are printed as physically distinct entities. Accordingly, the user can use a standard "cut and paste" process for attaching such data strips, if desired, to the human readable renderings of the files to which they pertain. In this system, the scanner used to read the printed data strips is not a general-purpose document scanner, but rather, a special-purpose hand-held computer peripheral optimized for reading said data strips, as specified in Brass et al., U.S. Pat. No. 4,692,603, "Optical reader for printed bit-encoded data and method of reading same," which issued Sep. 8, 1987. Thus this system could not be said to close the loop between common document production and reprographic equipment, as the present invention intends. Drexler U.S. Pat. No. 4,665,004, which issued May 12, 1987 on "Method for Dual Image Recording of Medical Data," also is interesting because it proposes using a specialized optical recording system and recording medium for optically recording the raw digital data for a computer generated pictorial image in a form that permits the raw data (including digitized versions of any optional written or oral annotations) to be physically secured to the human readable, hardcopy rendering of the image. However, that approach has the drawback of requiring the use of different recording mechanisms for producing the machine readable digital data representation and the human readable rendering. Moreover, the digital data is not recorded in a form that permits it to be readily copied using ordinary office equipment.
A commonly assigned J. J. Daniele United States patent which issued Mar. 1, 1988 as U.S. Pat. No. 4,728,984 on "Data Handling and Archiving System" is believed to be especially noteworthy because it relates to the use of an electronic printer for recording digital data on plain paper, together with the use of an input scanner for scanning digital data that has been recorded on such a recording medium to upload the data into the internal computer of the printer. The Daniele '984 patent discusses several subjects which are meaningful to the present invention, including the redundant recording of digital information, the archival storage and distribution of digital data recorded on plain paper, the compression that can be achieved by digitally recording text and graphics, the data security that can be achieved by encrypting digitally recorded text and graphics, Moreover, it discloses a typical printer and a typical input scanner in substantial detail. Therefore, the '984 patent hereby is incorporated by reference.
Paper documents still are a primary medium for written communications and for record keeping. They can be replicated easily by photocopying, they can be distributed and filed in original or photocopied form, and facsimiles of them can be transmitted to remote locations over the public switched telephone network. Paper and other hardcopy documents are so pervasive that they are not only a common output product of electronic document processing systems, but also an important source of input data for such systems.
In recognition of the fundamental role human readable hardcopy documents play in modern society, input scanners have been developed for uploading them into electronic document processing systems. These scanners typically convert the appearance of the hardcopy into a raster formatted, digital data stream, thereby providing a bit mapped representation of the hardcopy appearance. However, bit maps require relatively large amounts of memory and are difficult to edit and manipulate, so substantial effort and expense have been devoted to the development of recognition processes for converting bit mapped document appearances into corresponding symbolic encodings. Unfortunately, recognition processes generally are inferential and of limited scope, so they have difficulty correlating unusual bit map patterns with corresponding encodings and they are prone to making inference errors even when they determine that a correlation exists.
Turning for a moment on the conventional hardcopy output of electronic document processing systems, it will be evident that a hardcopy rendering of an electronic document often is only a partial representation of the content of the corresponding electronic document file. The appearance of a hardcopy rendering is governed by the structure and content of the electronic document to which it pertains, but the digital data encodings which define the structure and content of the electronic document are not explicitly embodied by the rendering. So-called "intelligent" input scanners (scanners equipped with substantial image-processing software) having sufficient knowledge of the structural encoding rules theoretically can recover the structural encodings for at least some types of electronic documents from hardcopy renderings of them, but the practical results frequently do not conform to the theoretical expectations, especially if the hardcopy is distorted (such as by a photocopying or facsimile process), damaged or altered prior to being input scanned.
Furthermore, some types of electronic document data are virtually impossible to infer from a hardcopy rendering. For example, electronic spreadsheets conventionally include computational algorithms for defining the computations which are required to compute the speadsheet, but these algorithms generally are not explicitly set forth in the hardcopy rendering of the computed spreadsheet. Likewise, electronic hypertext documents and multimedia documents ordinarily contain pointers which link them to related electronic documents, but the links provided by those pointers usually are not embodied in the hardcopy renderings of such documents. Still another example is provided by computer generated synthetic graphical images where the control points for the graphical objects that form the image and the data defining the curves which fit those control points normally can only be approximated from a hardcopy rendering of such an image. As still another example, it will be understood that prints generated by computer aided design (CAD) systems typically are approximate representations of the high precision data of the underlying electronic file, which often contains three dimensional information. As a general rule, the mathematical models and the related data from which such a system generates such prints is not fully recoverable from a hardcopy rendering representing any single view. As a further example, it is to be understood that the color values for objects (such as the cyan, magenta, yellow and black values for printed four-color images) also are difficult to ascertain with any substantial certainty from a hardcopy color rendering, and would be impossible to recover from a black & white copy of that color document hardcopy. There are times when documents are printed in black and white as a result of the limited capabilities of the available printer, even though the original electronic source document might have been intended to provide a full color, a functional color, or a highlight color representation. Indeed, even some of the more fundamental attributes of electronic documents, such as their file names, author, creation date, etc., are seldom found in the hardcopy renderings of such documents.
Consequently, it will be evident that it would be a significant improvement if the ordinary hardcopy output of electronic document processing systems could be employed as an essentially lossless media for storing all or part of the structure and content of electronic documents and for transferring that data from the printer of one electronic document processing system to the input scanner of the same or another document processing system. Hardcopy documents of that type would not only continue to function as a convenient medium for distributing and storing human readable renderings of electronic documents, but also would provide a convenient alternative to the digital mass memories which customarily are used for storing electronic documents and to the digital data links and removable digital recording media which normally are employed for transferring electronic documents from one location to another. Furthermore, the integration of machine readable digital representations of electronic documents with human readable renderings of them would permit various combinations of human and computer information processing steps to be employed for processing information more easily and quickly.