This invention relates generally to computer systems having printers and display devices and, more particularly, to printing final form print documents containing various print objects on a printer.
Computer systems can generate output information in several ways, including video output and xe2x80x9chard copyxe2x80x9d or printed output. Although more and more output consists of evanescent video screens, a large amount of data is still printed on paper and other permanent media. Therefore, there is a need for efficiently describing printed data and then printing a hard copy page from the print description. Since many modern printers are xe2x80x9cintelligentxe2x80x9d and capable of storing commands and data, the print description is preferably arranged to minimize the amount of information transferred to the printer over a data transfer path, such as a network.
There are several prior art systems which have been designed to accomplish the aforementioned objectives. Generally, the print data stream is encoded by means of a page description language which describes the format of each page. There are several conventional page description languages. One of these is called POSTSCRIPT(copyright) which is a print document description language which has been developed by the Adobe Corporation, San Jose, Calif. A POSTSCRIPT(copyright) encoded document includes a file containing page description commands or xe2x80x9coperators.xe2x80x9d The POSTSCRIPT(copyright) operators describe how each page in the document is composed. A single POSTSCRIPT(copyright) file can generate a multi-page document because each page is composed according to the operators in the file. Therefore, POSTSCRIPT(copyright) produces a very compact encoded document structure.
One problem with documents described using POSTSCRIPT(copyright) is that it is not possible to examine a POSTSCRIPT(copyright) encoded document and ascertain where a particular page begins and ends without starting at the beginning of the document and calculating where each page break occurs. Therefore, documents encoded in the POSTSCRIPT(copyright) language are difficult to view on a display screen. Further, if an error occurs during printing, it may be difficult to restart the printer at any place except the beginning of the document.
In order to overcome some of the difficulties with POSTSCRIPT(copyright), another page description language was developed by the Adobe Corporation called ENCAPSULATED POSTSCRIPT (EPS.) An EPS file is a self-contained version of a POSTSCRIPT(copyright) file that describes a single-page graphic. In particular, drawings or artwork are normally encapsulated in self-contained EPS objects that do not refer to any resources other than those that are part of the EPS object. Typically, an EPS object encodes a single page graphic image. Each EPS object contains a header portion called a xe2x80x9cprologxe2x80x9d followed by normal POSTSCRIPT(copyright) operators. The prolog contains resource and formatting information and describes the POSTSCRIPT(copyright) data in the object.
A problem with EPS objects is that each object contains prolog information. If the same EPS object, or EPS objects created by the same application, are included on each page of a multi-page document to be printed, then the same prolog information is in each EPS object and will be downloaded to the printer for each page. The result is that a large amount of redundant data will be sent to the printer.
Another formatting language is called the Portable Document Format (PDF) language which was also developed by the Adobe Corporation. PDF is a file format which describes a group of pages to be viewed and uses graphic operators which are similar, but not equivalent, to POSTSCRIPT(copyright) operators. However, a PDF file actually refers to single pages that can be viewed and printed separately inside the file. A PDF file is constructed with a header, a trailer, a cross reference table and a body. Page objects containing information for each page are located inside the body, in random order, and resource objects containing resource information are also located inside the body. The trailer portion of the file contains a pointer to the cross reference table and the cross reference table indexes the various page and resource objects. Since the cross reference table is located in the trailer the file can be generated by a one pass operation. In addition, viewing the pages in any order is straightforward. Specifically, in order to view a page, the trailer in the PDF file is first located to obtain the pointer to the cross reference table. Once the cross reference table is located then the index to a particular page can be obtained.
While viewing a PDF file is straightforward, printing with this format is not optimum because the trailer must be located before printing of the file can start. Since the file trailer is at the end of the file, the entire file must be loaded before printing can start. If the file is large, a substantial amount of memory is required. In addition, each page may also contain resources files, such as fonts and bitmaps, which may be referred to in the file at various locations. Therefore, if the same resources are included on each page of a multi-page PDF document to be printed, then the same resource information will be downloaded to the printer for each PDF page. The result is similar to the EPS situation in that a large amount of redundant data will be sent to the printer.
Another page description language is known as MO:DCA(trademark) (Mixed Object Document Content Architecture), described in detail in I.B.M. Mixed Object Document Content Architecture Reference number SC31-6802. This language has the characteristic that page information is stored in the order that it is printed so that file processing can begin as soon as the information for the first page is located. During file construction, common resources, such as fonts, are removed from the print data and stored in a separate resource database. A reference is placed in the file to refer to the stored resource.
The MO:DCA file format is designed to be used with a printing system known as the xe2x80x9cAdvanced Function Presentationxe2x80x9d (AFP) printing system developed by, and available from, International Business Machines Corporation, Armonk, N.Y. This printing system has an intelligent print server which receives the print data and uses the references in the data stream to retrieve the stored resources from the resource database. The resources are then downloaded to the printer ahead of the data. At the printer the resources are combined with the print data and sent to a rasterizer for printing.
The MO:DCA file format has the advantage that pages can easily be located in the data stream because the page information is stored in the order that it will be printed. Also, each page contains an xe2x80x9cActive Environment Groupxe2x80x9d definition that specifies the resources that are required to print the page. In addition, if an error occurs during printing, it is possible to restart the printing process from the last page printed rather than from the beginning of the file. However, the AFP system only provides these advantages with the xe2x80x9cnativexe2x80x9d MO:DCA file format and xe2x80x9cnativexe2x80x9d objects contained therein. It cannot provide the same level of recovery with other, xe2x80x9cnon-nativexe2x80x9d, file formats such as EPS and PDF objects discussed above. In order to print these objects it is necessary to regenerate the objects in MO:DCA format.
Therefore, it,would be desirable to modify the AFP system to handle non-native print objects which are originally formatted in various different formats, such as EPS and PDF formats. It would also be desirable to manage the resources in such objects so that the redundant downloading of resources is avoided.
In accordance with the principles of the invention, the print system is extended to handle non-native print objects, such as EPS and PDF print objects, without requiring the non-native objects to be reformatted, by selectively decomposing the EPS and PDF print objects and placing resource information in the resource database while leaving the print data, or a reference to the print data, in the page description data stream. Appropriate references are placed in the page description data stream so that the resource information can be retrieved and downloaded to the printer when the print data is being processed at the print server. When the print data is sent to the printer, the downloaded resource information is recombined with the print data and the combination is sent to a conventional EPS or PDF rasterizing program which converts the objects to printable data.