1. Field of the Invention
The invention relates to the field of PostScript data stream processing, wherein the PostScript data stream is not compliant with PostScript Document Structuring Conventions (DSC).
2. Statement of the Problem
PostScript is a page description language (PDL) that contains a set of commands that are used to describe pages in a print job. A principal difference between PostScript and other PDLs is that PostScript is a programming language. This provides power and flexibility in expressing page content, although the pages are not easy to interpret. In order to correctly interpret pages or perform meaningful transformations on PostScript data, a PostScript interpreter is needed. Adobe Configurable PostScript Interpreter (CPSI) is one example of a PostScript interpreter, which processes a PostScript job and produces bitmaps. Adobe Distiller is another example of a PostScript interpreter, which processes a PostScript job and produces a PDF file. However, certain limitations exist within the PostScript language. For example, speed limitations prevent PostScript print jobs to be executed at printer-rated speeds. Also, PostScript print jobs cannot be separated into independent pages, as required for executing pages in parallel on multiple central processing units (CPUs).
The processing of a PostScript job generally consists of two (typically overlapping) stages; an interpretation-stage and an output-stage. During interpretation, a PostScript job is parsed and the internal job structure is created. This internal job structure may be a linked list of high-level and low-level graphical objects, a complex state that describes pages in the job, etc. During the output stage, the internal job structure is processed, and the required output is created. In case of a printing system, pages are rendered and rasterizes as, for example, a raw bitmap for printing.
Interpretation is considered as a “light stage”, while rendering is considered as a “heavy stage” as far as the amount of data produced. For example, typical source data for a PostScript page that contains text and graphics is about 100 KB. When rendered at 600 times 600 dpi CMYK, a typical raw bitmap page is about 100 MB, which is 1,000 times larger than the source data. Thus, in order to skip rendering, a technique of “writing to null device” has been used since the inception of the PostScript language.
With the null device technique, rendering of pages may be skipped by setting a null device and then re-establishing a real device to resume rendering. The null-device approach is typically augmented by redefinition of multiple PostScript operators (e.g. show, image, etc.) to further reduce the interpretation overhead. The pages may then be rendered in parallel. For example, four processors may be configured to receive an entire PostScript job with each processor skipping some pages and processing others. To illustrate, a first processor may process pages 1, 5, 9 . . . , while a second processor processes pages 2, 6, 10 . . . , a third processor processes pages 3, 7, 11 . . . , and a fourth processor processes pages 4, 8, 12 . . . .
The advantages of this approach are easily recognizable. For example, assume that it takes a single-CPU system 100 seconds to process the entire job. Then further assume that interpreting is four times faster than rendering, which is fairly reasonable. Based on these assumptions the interpretation takes 20 seconds, while rendering takes 80 seconds. Each of the four processors then spends the same 20 seconds for interpreting (each processor needs to interpret the entire job), but only 20 seconds for rendering (each processor needs to render only a quarter of the pages). In this case the entire job is processed in 40 seconds. This achieves 2.5 times performance gain (100/4=2.5). However, this centralized interpreter approach makes the interpreter the bottleneck in parallel processing because the interpreter processing time is constant. In other words, the processing time of the interpreter does not decrease according to the number processors being used to render the print job. Thus, removing the processing bottleneck associated with the interpreter would increase the overall speed of PostScript processing.
Realizing the issues related to unstructured nature of PostScript jobs, Adobe published “Adobe Document Structuring Conventions Specification Version 1” (DSC Specifications) around 1986. The DSC specification defines a set of tags that allows easy parsing of PostScript resources and rearranging of pages. Now, one can successfully split a large set of PostScript jobs into independent pages by parsing for DSC comments and producer-specific patterns. DSC also allows for the combination of multiple PostScript jobs produced by different applications into one PostScript job, thus achieving an even higher level of page independence.
For example, PostScript interpreters receive a single stream input and produce a single sequential rasterized output to be rendered into graphics on a display or printing hardware. The stream contains data for each page in sequence. But, PostScript cannot be parsed using a fixed set of rules because the language itself is usually redefined by data streaming into the interpreter unless the document complies with the PostScript Document Structuring Convention (DSC). If the PostScript document is DSC compliant then standardized comments may be used as delimiters to split and reassemble a document in a page-wise manner. Pages described in a PostScript print job stream may have zero or more dependencies on any data presented by the stream before prior pages are rendered. PostScript interpreters populate dictionaries to track how procedures and parameters are defined. Subsequent pages may call and use these definitions. Every definition which a valid page rendering sequence uses may be located anywhere inside previous pages of the PostScript print job stream. In DSC compliant PostScript, all the information needed to render each page is located in the prolog or after the last page was rendered. However, several applications produce PostScript that is not DSC compliant and manipulation of individual pages is still desired to improve parallel processing.