1. Field
This disclosure relates to a system and method for inserting and using metadata within a portable document format document.
2. Description of the Related Art
A multifunction peripheral (MFP) is a type of document processing device which is an integrated device, providing at least two document processing functions, such as print, copy, scan and fax. In a document processing function, an input document (electronic or physical) is used to automatically produce a new output document (electronic or physical).
Documents may be physically or logically divided into pages. A physical document is paper or other physical media bearing information which is readable unaided by the typical human eye. An electronic document is any electronic media content (other than a computer program or a system file) that is intended to be used in either an electronic form or as printed output. Electronic documents may consist of a single data file, or an associated collection of data files which together becomes a unitary whole. Electronic documents will be referred to further herein as documents, unless the context requires some discussion of physical documents which will be referred to by that name specifically.
In printing, the MFP automatically produces a physical document from an electronic document. In copying, the MFP automatically produces a physical document from a physical document. In scanning, the MFP automatically produces an electronic document from a physical document. In faxing, the MFP automatically transmits via fax an electronic document from an input physical document which the MFP has also scanned or from an input electronic document which the MFP has converted to a fax format.
MFPs are often incorporated into corporate or other organization's networks which also include various other workstations, servers and peripherals. An MFP may provide remote document processing services to external or networked devices.
Portable document format (PDF) writers typically enable the conversion of other document file formats (e.g. Microsoft® Word, jpeg images, and other formats) into portable document file formats. The benefit of the portable document format is that it is entirely (or nearly entirely) self-contained including all text, all formatting, all images, tables and any other formatting necessary for viewing and, if desired, printing the document. As a result, PDF writers are used to convert files for viewing and output on virtually any computer, device, printer or MFP.
The PDF standard was defined originally by Adobe Systems, Inc. and, more recently, by the International Standards Organization (ISO). PDF/A is a particular version of the PDF standard that is intended for use in document archival systems by enforcing a more rigid structure for the documents such that no fonts external to the document itself (e.g. system fonts) are referenced in PDF documents conforming to this standard. Document archival systems may be required to be accessed tens (or hundreds) of years after the documents are stored therein. As a result, it is advisable to avoid as many potential future compatibility issues outside of elements native to the document format itself that may be avoided.
Unfortunately, the rigidity of the file format itself has necessitated the use of external, associated files to control the automatic archiving of such documents. In order to quickly scan, digitize, and archive many thousands of documents, data derived therefrom may be used. For example, in a batch of extensive medical records, a patient name (or patient ID) may be used to associate a series of documents all pertaining to a particular patient. The name or ID may be derived from data in a physical document, or a preexisting PDF document, but may be stored in any number of ways within a PDF/A format compliant PDF document depending on the way in which the resulting PDF document is generated. Thus, it can be difficult for a computer system to refer to name or ID in order to accurately dispose of the PDF document during an archival (or retrieval) process.
In order to remedy this problem, the prior art has relied upon the generation of external documents, associated with each PDF document, to store characteristics, data, keywords, search terms, instructions and other, similar data related to the PDF document being archived. While this process is useful in archiving, it undermines the self-contained nature of PDF/A compliance and ensuring that the external documents remain associated with the appropriate (or any) PDF document may be difficult or may be inadvertently lost or destroyed over time as computer systems evolve. This is not desirable for long-term document archival systems.
Throughout this description, elements appearing in figures are assigned three-digit reference designators, where the most significant digit is the figure number where the element is introduced, and the two least significant digits are specific to the element. An element that is not described in conjunction with a figure may be presumed to have the same characteristics and function as a previously-described element having the same reference designator.