This invention relates generally to high-speed, high-volume production printing systems and, more particularly, to using such systems to print documents containing data which has been designed for viewing and other purposes.
Computer systems can generate output information in several ways, including video output and xe2x80x9chard copyxe2x80x9d or printed output. Although more and more output consists of evanescent video screens, a large amount of data is still printed on paper and other permanent media. Therefore, there is a need for efficiently describing printed data and then printing a hard copy page from the print description. The printing is often performed by high-speed, high-volume printing systems which receive streams of encoded print data and utilize xe2x80x9cintelligentxe2x80x9d printers that can store commands and data. Such encoded print streams often include data for many printed pages. For example, a telephone company might print all of its telephone bills for a specified week with a single print stream. Each page in the print stream may be a telephone bill for a particular customer.
There are several well-known prior art encoding systems which have been designed to efficiently accomplish different objectives. Generally, a print data stream is encoded by means of a page description language which describes the format of each page. There are several conventional page description languages. One of these is called POSTSCRIPT(copyright) which is a print document description language which has been developed by the Adobe Corporation, San Jose, Calif. A POSTSCRIPT(copyright) encoded document includes a file containing page description commands or xe2x80x9coperators.xe2x80x9d The POSTSCRIPT(copyright) operators describe how each page in the document is composed. A single POSTSCRIPT(copyright) file can generate a multi-page document because each page is composed according to the operators in the file.
One problem with documents described using POSTSCRIPT(copyright) is that it is not possible to examine a POSTSCRIPT(copyright) encoded document and ascertain where a particular page begins and ends without starting at the beginning of the document and calculating where each page break occurs. Therefore, documents encoded in the POSTSCRIPT(copyright) language are difficult to view on a display screen because the pages cannot be displayed in a random order. Further, if an error occurs during printing, it may be difficult to restart the printer at any place except the beginning of the document. This inability to restart and reposition the print stream can be a major problem in a production printing system where each print run can be thousands of pages.
In order to overcome some of the difficulties with viewing POSTSCRIPT(copyright), another page description language was developed by the Adobe Corporation called the Portable Document Format (PDF) language. PDF is a file format which describes a group of pages to be viewed and uses graphic operators which are similar to POSTSCRIPT(copyright) operators to describe the page format. The PDF format was originally designed to view documents and the Adobe Corporation distributes a product which converts printable POSTSCRIPT(copyright) files into PDF format to allow the files to be efficiently viewed.
A PDF file actually defines single pages that can be viewed and printed separately. A PDF file is constructed with a header, a trailer, a cross reference table and a body. Page objects containing information for each page are located inside the body, in random order, and resource objects containing resource information are also located inside the body. The trailer portion of the file contains a pointer to the cross reference table and the cross reference table indexes the various page and resource objects. Since the cross reference table is referenced by the trailer, the file can be generated by a one pass operation. In addition, viewing the pages in any order is straightforward. Specifically, in order to view a page, the trailer in the PDF file is first located to obtain the pointer to the cross reference table. Once the cross reference table is located then the index to a particular page can be obtained.
While viewing a PDF file is straightforward, printing with this format is not optimum because the trailer must be located before printing of the file can start. Since the file trailer is at the end of the file, the entire file must be loaded before printing can start. If the file is large, a substantial amount of memory is required. In addition, each page may also contain resources files, such as fonts and bitmaps, which may be referred to in the file at various locations. Therefore, the resources must also be loaded prior to printing.
In order to reduce the time delay caused by the necessity to load the entire file, the Adobe Corporation revised the PDF format into a xe2x80x9clinearizedxe2x80x9d version in which the page data and resources for the first page are located at the beginning of the file so that the first page can be viewed while the remainder of the file is being loaded.
Although effective for viewing, this file format is not adequate for high speed production printing, where the printer expects a linear stream of page data with resources either in-line or already pre-stored in the printer. Since the PDF format was not designed for driving a printer, it does not incorporate any of the necessary mechanisms to carry on a two-way communication between a printer and a printer server that is necessary to manage high speed production printing operations. These operations include, but are not limited to, effective page level error recovery, operator repositioning, resource management, external formatting, and error reporting.
In order to print documents in PDF format, many prior art systems transform the PDF format document into a POSTSCRIPT(copyright) format document and then print the POSTSCRIPT(copyright) format document. As mentioned above, the POSTSCRIPT(copyright) format does provide for the linear streaming of the print data as required for production print streams, but the format does not provide any of the key mechanisms described above which allow the two-way communication required for high speed production printing environments.
Another page description language is known as MO:DCA(trademark) (Mixed Object Document Content Architecture), described in detail in I.B.M. Mixed Object Document Content Architecture Reference number SC31-6802. This language has the characteristic that page information is stored in the order that it is printed so that file processing can begin as soon as the information for the first page is located. During file construction, common resources, such as fonts, are removed from the print data and stored in a separate resource database. A reference is placed in the print file to refer to the stored resource.
The MO:DCA file format is designed to be used with a printing system known as the xe2x80x9cAdvanced Function Presentationxe2x80x9d (AFP) printing system developed by, and available from, International Business Machines Corporation, Armonk, N.Y. This printing system has an intelligent print server which receives the print data and uses the references in the data stream to retrieve the stored resources from the resource database. The resources are then downloaded to the printer ahead of the data. At the printer the resources are combined with the print data and sent to a rasterizer for printing.
The MO:DCA file format has the advantage that pages can easily be located in the data stream because the page information is stored in the order that it will be printed. Also, each page contains an xe2x80x9cActive Environment Groupxe2x80x9d definition that specifies the resources that are required to print the page. In addition, if an error occurs during printing, it is possible to restart the printing process from the last page printed rather than from the beginning of the file. However, the AFP system only provides these advantages with the xe2x80x9cnativexe2x80x9d MO:DCA file format and xe2x80x9cnativexe2x80x9d objects contained therein. It does not provide the same level of recovery with other, xe2x80x9cnon-nativexe2x80x9d, file formats, such as the PDF file format discussed above.
Therefore, it would be desirable to convert file formats which are originally designed for viewing, such as the PDF file formats, into a print stream which is efficient for production printing and supports a bi-directional data and command flow between a production printer and a print system.
In accordance with the principles of the invention, a conversion program selectively decomposes viewable data, such as a PDF file, and generates a print-structured, bi-directional stream composed of print data objects, resource objects and command objects. The conversion program uses a mapping table to determine which resources have already been sent to the printer so that redundant resources are not sent to the printer.
The print data, resource and command objects are sent from a print server which controls the print system to a printer by means of a containerized data stream such as the Intelligent Printer Data Stream(trademark) which is bi-directional.