1. Field of the Invention
This invention relates in general to printing systems, and more particularly to a method, data structure and apparatus for identifying resources prior to printing.
2. Description of Related Art
Print systems include presentation architectures, which are provided for representing documents in a data format that is independent of the methods utilized to capture or create those documents. One example of an exemplary presentation system, which will be described herein, is the AFP™ (Advanced Function Presentation) system developed by International Business Machines Corporation. However, those skilled in the art will recognize that the present invention is not meant to be limited to the AFP™ system, but rather the AFP™ system is presented herein as merely one example of a presentation system applicable to the principles of the present invention.
According to the AFP™ system, documents may contain combinations of text, image, graphics, and/or bar code objects in device and resolution independent formats. Documents may also contain and/or reference fonts, overlays, and other resource objects, which are required at presentation time to present the data properly. Additionally, documents may also contain resource objects, such as a document index and tagging elements supporting the search and navigation of document data for a variety of application purposes. In general, a presentation architecture for presenting documents in printed format employs a presentation data stream. To increase flexibility, this stream can be further divided into a device-independent application data stream and a device-dependent printer data stream. A data stream is a continuous ordered stream of data elements and objects that conform to a given formal definition. Application programs can generate data streams destined for a presentation device, archive library, or another application program. The Mixed Object Document Content Architecture (MO:DCA)™ developed by International Business Machines Corporation of Armonk, N.Y. defines a data stream, which may be utilized by applications to describe documents and object envelopes for document interchange and document exchange with other applications and application services. Interchange is the predictable interpretation of shared information in an environment where the characteristics of each process need not be known to all other processes. Exchange is the predictable interpretation of shared information by a family of system processes in an environment where the characteristics of each process must be known to all other processes.
A mixed object document is a collection of data objects that comprise the document's content and the resources and formatting specifications that dictate the processing functions to be performed on that content. The term “Mixed” in the Mixed Object Document Content Architecture (MO:DCA)™ refers to both the mixture of data objects and the mixture of document constructs that comprise the document's components. A Mixed Object Document Content Architecture (MO:DCA)™ document can contain a mixture of resource objects types, which each have a unique processing requirement. The Mixed Object Document Content Architecture (MO:DCA)™ is designed to integrate the different data object types into documents that can be interchanged as a single data stream and provides the data stream structures needed to carry the data objects. The MO:DCA™ data stream also provides syntactic and semantic rules governing the use of objects to ensure different applications process objects in a consistent manner.
In its most complex form a Mixed Object Document Content Architecture (MO:DCA)™ document contains data and resource objects along with data structures which define the document's layout and composition features. This form is called a Mixed Object Document Content Architecture (MO:DCA)™ presentation document. Within such a data stream the Mixed Object Document Content Architecture (MO:DCA)™ components are defined with a syntax that consists of self-describing structures called structured fields. Structured fields are the main Mixed Object Document Content Architecture (MO:DCA)™ structures and are utilized to encode Mixed Object Document Content Architecture (MO:DCA)™ commands. A structured field starts with an introducer that uniquely identifies the command, provides a total length for the command, and specifies additional control information such as whether padding bytes are present. The introducer is then followed by data bytes. Data may be encoded within the structured field utilizing fixed parameters, repeating groups, key words, and triplets. Fixed parameters have a meaning only in the context of the structure that includes them. Repeating groups are utilized to specify grouping of parameters that can appear multiple times. Key words are self-identifying parameters that consist of a one byte unique key word identifier followed by a one byte keyword value. Triplets are self-identifying parameters that contain a length field, a unique triplet identifier, and data bytes. Key words and triplets have the same semantics wherever they are utilized. Together these structures define a syntax for Mixed Object Document Content Architecture (MO:DCA)™ data streams which provide for orderly parsing and flexible extendibility.
The document is the highest level within the Mixed Object Document Content Architecture (MO:DCA)™ data stream document component hierarchy. Documents may be constructed of pages, and the pages, which are at the intermediate level, may be made up of data objects. Data objects are at the lowest level and can be bar code objects, graphics objects, image objects and presentation text.
Multiple documents may be collected into a print file. A print file may optionally contain, at its beginning, an “inline” resource group that contains resource objects required for print. Alternatively, the resource objects may be stored in a resource library that is accessible to the print server, or they may be resident in the printer.
A Mixed Object Document Content Architecture (MO:DCA)™ document in its presentation form is a document which has been formatted and is intended for presentation, usually on a printer or a display device. A data stream containing a presentation document should produce the same document content in the same format on different printers or display devices, dependent on the capabilities of each of the printers or display devices. A presentation document can reference resources that are to be included as part of the document to be presented, which are not present within the document as transmitted within the MO:DCA™ data stream.
Pages within the Mixed Object Document Content Architecture (MO:DCA)™ are the level within the document component hierarchy which is utilized to print or display a document's content. Each page has associated environment information that specifies page size and that identifies resources required by the page. This information is carried in a MO:DCA™ structure called an Active Environment Group (AEG). Data objects contained within each page envelope in the data stream are presented when the page is presented. Each data object has associated environment information that directs the placement and orientation of the data on the page, and that identifies resources required by the object. This information is carried in a MO:DCA™ structure called an Object Environment Group (OEG).
Delimiters that identify the object type, such as graphics, image or text, bound objects in the data stream. In general, data objects consist of data to be presented and the directives required to present it. The content of each type of data object is defined by an object architecture that specifies presentation functions, which may be utilized within its coordinate space. All data objects function as equals within the Mixed Object Document Content Architecture (MO:DCA)™ data stream environment. Data objects are carried as separate entities in the Mixed Object Document Content Architecture (MO:DCA)™ data stream.
Resource objects are named objects or named collection of objects that can be referenced from within the document. In general, referenced resources can reside in an inline resource group that precedes the document in the MO:DCA™ data stream or in an external resource library and can be referenced multiple times. Resource objects may need to be utilized in numerous places within a document or within several documents.
An object container within the Mixed Object Document Content Architecture (MO:DCA)™ is an envelope for object data that is not necessarily defined by an International Business Machines Corporation presentation architecture and that might not define all required presentation parameters. The container consists of a mandatory Begin/End structured field pair, an optional Object Environment Group (OEG) and mandatory Object Container Data (OCD) structured fields. If an object is to be carried in Mixed Object Document Content Architecture (MO:DCA)™ resource groups and interchanged, it must, at a minimum, be enveloped by a Begin/End pair. The Object Classification triplet on the Begin structured field must specify the registered object identifier (OID) for the object data format, and the data must be partitioned into OCD structured fields.
A printer data stream within a presentation architecture is a device-dependant continuous ordered stream of data elements and objects conforming to a given format, which are destined for a presentation device. The Intelligent Printer Data Stream (IPDS)™ architecture developed by International Business Machines Corporation and disclosed within U.S. Pat. No. 4,651,278, which is incorporated herein by reference, defines the data stream utilized by print server programs and device drivers to manage all-points-addressable page printing on a full spectrum of devices from low-end workstation and local area network-attached printers to high-speed, high-volume page printers for production jobs, Print On Demand environments, shared printing, and mailroom applications. The same object content architectures carried in a MO:DCA™ data stream are carried in an IPDS™ data stream to be interpreted and presented by microcode executing in printer hardware. The IPDS™ architecture defines bi-directional command protocols for query, resource management, and error recovery. The IPDS™ architecture also provides interfaces for document finishing operations provided by pre-processing and post-processing devices attached to IPDS™ printers.
The IPDS™ architecture incorporates several important features. As noted above, since the IPDS™ architecture supports the same objects as those carried by the MO:DCA™ data stream, the IPDS™ architecture enables the output of multiple diverse applications to be merged at print time so that an integrated mixed-data page, including text, images, graphics, and bar code objects, results. The IPDS™ architecture transfers all data and commands through self-identifying structured fields that describe the presentation of the page and provide for dynamic management of resources, such as overlays, page segments and fonts as well as the comprehensive handling of exception conditions. Furthermore, the IPDS™ architecture provides an extensive acknowledgement protocol at the data stream level, which enables page synchronization of the host (e.g., print server) and printer processes, the exchange of query-reply information, and the return to the host of detailed exception information.
As can be seen, print data streams reference resources, such as images and fonts, that are required for presentation. Further, these resources are typically referenced in the print stream at the point that they are required. For example, in an AFP (MO:DCA)™ print file, each page identifies the resources that are required to print the page in the Active Environment Group (AEG) that is part of the page object as discussed above. However, in PostScript™ and PDF files, resources may be identified anywhere in the file.
Such resource identification places a burden on the print system in that it requires ‘real-time’ downloading and processing of the resource. For example, assume an image I1 is required on page P1. If I1 is identified as part of P1, then, assuming I1 is not present in the printer, I1 must be downloaded with the P1 page data. The download time therefore takes up part of the P1 print window. This works without a print-underrun as long as the print window is large enough to accommodate the I1 transmission time.
The largest print resources are normally raster images. When the images are monochrome (1 bit per image point), current technologies are capable of processing pages along with their images at reasonably high speeds. However, when the images are color (24 bits per image point or even 32 bits per image point), at high resolution (e.g. 600 dpi), the image transmission time can no longer be tolerated in the print window, and print underruns result, i.e., the printer is ready to print a page, but lacks at least a resource that is to be printed. Some printers are incapable of stopping the paper path on-the-fly, which results in blank pages. This means that not only is paper wasted, but more importantly blank pages appear randomly inserted in the job output.
For example, an 8×10 CMYK (Cyan, Magenta, Yellow and BlacK) color image, at 600 dots per inch (dpi), JPEG compressed with a compression ratio of 10:1, still contains about 10 MB (megabytes) of data. If the typical attachment bandwidth is 2.5 MB/sec between the printing system and the server containing the image, 4 seconds are required just to download the image from the server to the printing system. While page and resource buffering in the printer can save some of this time, it is clearly incompatible with a print window of 0.5 seconds/page (for a 120 ppm printer).
One solution to this problem involves the use of a printer structure called a collator. The complete print file is first downloaded to the printer and then processed by the raster image processor into print-ready format. Once in this format, any number of copies of the file are generated out of the collator without requiring any transmission between print server and printer. The problem with this solution is that it does not allow real-time printing. Moreover, a huge time delay is incurred before the first copy of the first page is printed. In addition, it requires a very large amount of (expensive) disk space in the printer for the collator function.
U.S. Pat. No. 5,469,533, issued Nov. 21, 1995 to Stephen V. Dennis and assigned to Microsoft Corporation, discloses a printer system that includes a resource assembler that examines a complete document in whatever format the document is generated. Then, the resource assembler determines all of the resources that are required to print the entire document. However, as the document becomes large, identifying all of the resources prior to beginning printing is too time consuming to be practical. Further, additional processing is needed to analyze the generated document and to identify all resources for the entire document.
It can be seen that there is a need for a method, data structure and apparatus that enables early identifying of resources to improve print speed and efficiency.
It can also be seen that there is a need for a method, data structure and apparatus that avoids print underruns by ensuring that complex resources are downloaded to the printer before the first page is initiated.
It can also be seen that there is a need for a method, data structure and apparatus that provides a structure at the beginning of a document description or at the beginning of a group of pages that identifies any complex resources required by the pages that follow.