Processing performance speedup can be achieved through the use of multi-core and multi-processor architectures in computer systems, including printing systems. Multi-core and multi-processor architectures allow a number of computation threads belonging to one or more processes to execute in parallel across a number of cores or processors, thereby reducing the overall execution time.
One multi-core architecture is a graphics processing unit (GPU). GPUs are commonly used to accelerate the rendering of three dimensional (3D) graphics for viewing on a display device. However, in recent years, GPU manufacturers have enabled the general-purpose programming of GPUs. The general-purpose programming of GPUs is commonly known as general-purpose GPU computing (GPGPU). GPUs and other types of architectures, such as single-instruction multiple-data (SIMD) units and field programmable gate arrays (FPGAs), are also known as stream processors.
Modern GPUs typically contain hundreds of simple processing cores, also known as processing elements. Such GPUs are therefore suited to applications and algorithms that can be executed with a large degree of parallelism, requiring little or no communication among parallel threads and minimal branching within a thread's instruction stream. Extracting such a large degree of parallelism from applications originally designed for sequential execution is often difficult. However, the potential benefits of GPU processing can be large.
Print data is often pre-processed to form some kind of region-based intermediate representation. The pre-processing may be done to compress the print data and use less memory, to transfer the print data across a network, or to enable printing in real time, for example. Due to the nature of region-based intermediate representations, it is often difficult to process the region-based intermediate representations with many parallel threads, including on GPU architectures. The main difficulty is that such an intermediate representation is designed to be processed in a sequential fashion. Some prior art methods store the intermediate representation (also referred to as a fillmap) as a sequence of rasterised non-overlapping edges, grouped into tiles of the page. A region is bounded by edges and all pixels within a region are rendered using one set of rendering functions from a compositing stack. Thus, the fillmap is also referred to as a region based representation 540 (seen in FIG. 5C) described by a plurality of edges 541-545, each of which refers to one of the fill compositing sequences 551-554. For each tile, the edges are typically sorted by their start coordinate, first by start y-coordinate and then by start x-coordinate. The start coordinate of an edge is the coordinate of the first pixel in the tile that the edge activates, when pixels are traversed in scan line order and from left to right. For each edge, the x-coordinates of the edge's crossings with successive scan lines are stored in a sequence. To reduce memory usage, each x-coordinate is typically stored as a difference between the x-coordinate and the previous x-coordinate (an encoding method known in the art as “delta encoding”). To render an individual portion of a tile using a region based representation, all edges must be searched sequentially, the delta-encoded x-coordinates decoded, and the x-coordinates sorted along each scan line to identify which edge activates the pixels in the portion being rendered. The process of rendering the portions, repeated for every portion of the tile, requires a large amount of redundant processing (i.e., similar steps are repeated for every portion) and is therefore very inefficient. For this reason, the described region based representation is suitable for processing sequentially, but is not suitable for processing in parallel.
Other known methods render individual tiles of image data in parallel using separate processors. For example, pixels in a rectangular tile are rendered by the parallel processors. However, the rendering functions for pixels might be different. Thus it may cause one processor to wait for others if the processor executing a simple rendering function finishes earlier than other processors executing difficult rendering functions. In this case, some processors are not fully utilised and the rendering performance is not optimal.
Another known method renders regions as polygons. Each polygon is rendered using a 3D rendering pipeline such as OpenGL where each module in the pipeline is executed using special hardware on the GPU. A polygon is first divided into triangles, each of which is then rasterised into fragments. A fragment corresponds to a pixel. It contains a position (x, y), colour, depth(s) and optionally the position of a texel (texture pixel). Each fragment can be modified depending on the required light shading effect and then the fragment is rendered into a pixel colour according to the position, depth and blending function associated with the fragment. The fragments can be rendered into pixels in parallel using the GPU cores. As each fragment has an explicit pixel position (x, y), a thread rendering a fragment knows where to render the corresponding pixel. However, converting the compact intermediate representation in a format suitable for the 3D rendering pipeline is inefficient as the conversion involves re-rasterising the regions described by the edges in the intermediate representation and storing compositing information for rendering each pixel.
Other known methods render a non-rectangular region with an associated compositing stack by subdividing the region into smaller rectangular sub-regions. However, subdividing the region into smaller rectangular sub-regions potentially generates large numbers of small regions. The overheads associated with managing the large number of regions can make the process inefficient.
Thus, a need clearly exists for a more efficient method of rendering an image.