The present exemplary embodiment relates to document processing systems. It finds particular application in conjunction with converting files from Adobe portable document (PDF) to a page description language (PDL) format such as Adobe PostScript. However, it is to be appreciated that the present exemplary embodiment is also amenable to other like applications.
Many different document formats exist for manipulating, processing, and printing documents. These formats range in complexity from a simple text file or html file to a document which uses a page description language to describe its content. One type of format, Adobe's portable document format (PDF), has become quite popular in exchanging documents over the Internet, while another type of format, Adobe's PostScript (PS) format is often used in high-end printers. Many high-end printer models are not capable of processing the PDF data stream directly. Typically, the PDF files are converted into the PS files for processing in the printing system via a PDF to PS converter. However, such conversions present a number of difficulties.
A PDF file is structured like a database. The PDF file contains a number of different pages of information, including various objects such as images. Often images are repeated on different pages throughout the file. To make storage more compact, the PDF file indexes the repeated images to allow fast random access to each image within the PDF file. For each repeated object, a complete set of data for the image is stored at the first occurrence of the image and the image is indexed. At each subsequent occurrence of the image a reference to the indexed original image is created.
A PostScript file, on the other hand, is treated as a stream of data that is interpreted in a linear fashion. The PostScript format provides no mechanism for random access to objects within the PostScript stream of data. Thus, when a PDF file is converted to a PostScript stream, the PDF to PS converter inserts the indexed objects back into the document.
The PDF to PS conversion has few drawbacks. Often, the tags on form objects are lost after conversion. The insertion of the recurring images into the PS file results in substantial size of the resulting converted PostScript document file. Many systems and printers do not accept a PostScript file that is larger than 2 Gb. In addition, the images are processed through the raster image processor (RIP) each time the image appears in the postscript stream. The additional RIP time results in decreased throughput of the document processing system and prevents the images from being cached.