The present preferred embodiment concerns a method, a printing system and computer programs to generate and process document data streams. Depending on the field of application, in the generation and processing of document data various data formats have emerged in which the documents are generated and provided for additional processing at different output apparatuses (printing apparatuses, for example). For example, in the graphics industry the Postscript (PS) and Portable Document Format (PDF) data formats have been established that are optimized for encoding complex graphical structures. The PDF format has additionally been found in broad fields of office applications and, in the meantime, has been adopted as ISO Standard 15930 of the International Organization for Standardization. The Page Description Language (PCL) data format has been established in office environments.
In the field of high-speed output of digital data—for example to print invoices from databases of large-scale mainframes—the Advanced Function Presentation (AFP) data format has been developed, including its numerous additional developments such as—for example—Mixed Object Document Content Architecture (MO:DCA) and the Intelligent Printer Data Stream (IPDS) format, which is connected with the Mixed Object Document Content Architecture (MO:DCA) and optimized for output to printing apparatuses. The MO:DCA format, for example, is described in the specification “MO:DCA-Reference of International Business Machines Corporation (IBM)”, IBM document number SC31-8602-06, 7th ed. (January 2004). What are known as object containers with which other data can be enclosed are explained on Pages 15 and 100 through 102 of this document. Object types that are registered in the MO:DCA architecture are listed on Pages 542 and 543. The “PDF Single-page Object” object type is cited there under the ID component 25, for example.
In U.S. Pat. No. 6,538,760 B1 a method is described with which PDF single-page objects (for example) can be generated for an MO:DCA data stream. A PDF data stream is thereby analyzed in a print server. There the data stream is deconstructed into individual pages and the individual PDF page data are output in object containers to a printing apparatus.
Furthermore, the Personalized Print Markup Language (PPML) data format, which is described in the “PPML 2.2”, November 2006, PODi Organization, Rochester, N.Y., USA, was developed to output graphical information. In this data format it is also provided to adopt data of other formats as objects, for example PDF data (see Pages 27, 51 and 52 in this document). A corresponding method with which PDF data can be integrated into a PPML data stream is described in U.S. Pat. No. 7,434,160 B2. Via the “EXTERNAL_DATA_ARRAY” element the PDF data are thereby loaded as a complete file, the data are analyzed and then they are converted per object into what is known as a PPML template. PDF data are thereby selected per object, converted if necessary and transferred into the PPML template. This is then linked in turn with data of a PPML template and sent to a high-speed printing machine.
In a comparison of the aforementioned data streams it appears that data streams that are optimized for output to high-speed printing apparatuses (for example AFP or PPML) are exclusively structured per page. The data stream is thereby structured so that one page start command and one page end command are respectively provided for the pages, and the respective information for this page is between these commands. All information necessary for this page—in addition to the display objects as such, even the corresponding control information and supplementary information—is thereby either contained between these two page commands or already contained in the data stream in preceding pages or page-independent introductory information. After the termination of its page end command, each document page can thereby be completely processed immediately by a processing stage—in particular in a parsing process—in which the data of the document data stream are read, analyzed and interpreted corresponding to the language syntax.
The document data stream can thereby be extended immediately for a subsequent process (for example for a raster process) in which raster images are generated per page from the encoded page data. It is thereby possible—in particular in the processing of large data streams that contain a few ten or hundred thousand pages, for example—to already output the first pages of the data stream to a printing apparatus while the following pages are still running through preceding processing stages such as the parsing process and the raster image processor (RIP).
To the contrary, in other data formats (for example the PDF format) the data are not exclusively structured per page, such that first all pages or data of a PDF file must be read before a parsing process can completely resolve the page structure and can pass the data to a raster image processor (for example) for implementation of the raster process. For example, in the PDF format a reference table is provided at the end of the file, which reference table must be read before the pages of the document can be resolved.
A high time cost is thereby created in the process chain from the parsing process of the document to the output at a printing apparatus, in particular in the processing of data streams with a large page count, because the printing apparatus can only begin the printing process after the complete data stream has run through the parsing process.
In prior art FIG. 2 a method workflow is shown in, for example, a printing system 1 distributed by the applicant in which a print server 2 of the Océ PRISMA Production series and an Océ VarioStream 9240 color printing apparatus 12 are used.
For example, Postscript files 3 or PDF files 4 are thereby used to generate an AFP print data stream 13. The data are imported into the print server 2 and rastered there pixel by pixel with a raster image program (raster image processor, RIP) 5. The data that are rastered in this way are then converted into an AFP-specific image format IOCA (Image Object Content), and the corresponding IOCA objects 6 are supplied to an imposition program 7. Incoming Postscript or PDF documents that comprise a plurality of pages are thereby converted into IOCA objects in which precisely one rastered page of the document is contained per object. The IOCA objects that are thus generated per individual page are then individually supplied to the imposition program 7 in which the individual pages or IOCA objects are imposed corresponding to a page order (impositioning). Details of such an imposition process are to be learned in WO 00/68877 A, for example. Individual items of page information can thereby also be modified or supplemented with additional data, for example can be subjected to a clipping process. The imposition program 7 then outputs AFP data in which the individual pages are output as AFP IOCA data in a new order and possibly with modified information. This AFP print data stream 13 is converted in a converter 9 into an IPDS print data stream 14 and then transferred from the print server 2 to a printing apparatus. In the printing apparatus 12 a data controller is contained that can interpret print data in one or more languages. An IPDS parsing program 10 interprets corresponding IPDS data and prepares these for additional processing in the printing apparatus, which then ultimately prints the data on a recording medium, in particular on paper by means of a character generator 11 in an electrophotographic process.
What is disadvantageous in the method shown in FIG. 2 is that a rastering of (under the circumstances) large PDF files must already be performed by the raster image program 5 at a very early point in time in the processing chain, which leads to a relatively slow speed of the total process. In particular, it is thereby a problem that no output at the printing apparatus can take place until the entire PDF files have run through the raster process.
The aforementioned documents are herewith incorporated by reference into the present Specification.