In many printing systems, a page to be printed is described using a page description language (“PDL”), such as Adobe® Portable Document Format (“PDF”). A page description may be supplemented by a print job description, which specifies the desired output of the printing system. A print job description may include options for each page to be printed such as page layout, paper selection, duplexing, and other finishing options.
One aspect of a printing system is to convert the page description into pixel values according to the relevant options in the job description (“rasterisation”). Then, the pixel values can be sent to a printer engine for output according to the job description. The rasterisation process is typically performed using a central processing unit (“CPU”).
Many modern CPUs contain multiple cores that can execute instructions in parallel. Advantageously printing systems use parallel processing, or multi-threading, to improve the performance of the rasterisation process, thereby improving the overall throughput of the printing system. In such multi-threaded systems, tasks must be identified where most of the execution of that task is independent of other tasks. Tasks can be executed by sending a request to the operating system to create a new thread, or by submitting the task to a thread pool to be scheduled for execution when a thread is available.
The execution times of various task types are important to the overall performance of the system. If tasks are too large, the system may not perform optimally if there are insufficient tasks for the available processor cores. If tasks are too small, synchronisation overhead may degrade the performance of the system.
One approach to multi-threaded printing systems is to process each printed page as a separate task. This is a relatively simple approach for PDLs that are page-independent, such as Adobe PDF. Tasks are created for each page and assigned a page sequence number. Tasks can then independently produce pixel values for the assigned page into a page-specific output buffer. The only significant synchronisation that is required is to wait for the pages to finish in sequence.
Some PDLs, such as ADOBE® POSTSCRIPT®, are not page-independent, i.e. pages must be processed in sequence. Methods exist for pre-processing such documents to produce page-independent chunks. While relatively simple to implement, page-parallel rasterisation does have drawbacks. In particular, a significant drawback is that memory requirements increase linearly with the number of pages begin processed in parallel, and adding more memory increases the cost of the system. Furthermore, the memory bandwidth and some CPU caches are shared between all cores, potentially becoming bottlenecks within the system.
To overcome the shortcomings of page-parallel processing, printing systems may use multi-threading to improve the performance within a single page. Two key advantages to within-page-parallelisation exist. Firstly, the memory requirements are closer to that of single-threaded systems, thereby alleviating the memory-related drawbacks of page-parallel systems. Secondly, by improving the performance within a single page, the time to spool the first page is significantly reduced, thereby reducing the idle time of the printer engine and reducing the total time to print a document.
Most PDLs are designed to be processed so that graphic objects are produced in sequence from the bottom-most object to the top-most object, i.e., in what is commonly referred to as z-order. Furthermore, most PDLs contain a graphics state, which is updated as the sequence of graphic objects is produced. Generally, producing graphic objects out-of-order is difficult to do, since a graphic object may depend on previous graphics state commands. Therefore, most within-page-parallel systems process the sequence of objects in the PDL file using a master thread. The master thread processes a page description sequentially to produce tasks that can independently process the graphic objects. The output from these tasks is usually an intermediate representation. The outputs of the tasks are then combined in a final step to produce the pixels for the page.
While within-page-parallelisation controlled by the master-thread provides many benefits over page-parallel processing, a key assumption is that the master thread can generate enough tasks to fully utilise the available cores. Since the master thread is sequential due to the nature of the page description language, the master thread often becomes a bottleneck in the system, and other cores are under-utilised.
Furthermore, the current technological trend for CPUs is an increasing number of cores in a CPU. As new models of CPUs become available with more cores, the bottleneck created by the master thread becomes even more significant, and the under-utilisation of the processor is higher. Therefore, a need exists to reduce the amount of sequential processing performed by the master thread, to achieve a higher processor utilisation.