Output systems for digital images include computer displays (“monitors”) and printers. Users of computer software such as word processors, drawing programs, and spreadsheets create documents that place text, graphics, and bitmaps (e.g. photographs) on one or more pages of a document. Other sources of digital page descriptions are the World Wide Web (notably HTML pages), and computer software (e.g. database systems generate reports, 3D software creates polygon meshes of artificial scenes, etc.). When such a digital page must be output to a monitor or printer, software is invoked to describe precisely the location and attributes for each page element. All of the page elements combined form the page description. Each page element, with an arbitrary position, color, and size, is specified sequentially. This implies that a subsequent element may fully or partially overlap an earlier page element. The horizontal and vertical element positions are normally described in a two-dimensional coordinate system with the usual (X,Y), while the drawing order of the elements is often known as the Z order.
However, the simple horizontal scanning video circuits in computer monitors and print engine controllers accept only flat (2D), linear sequences of bits. These bits must show the parts of each page element that are finally visible on the line after any overlaps among elements are resolved. In the (X, Y, Z) scheme, this means that for each Y line on the page, the output video can accept only the bits along one line at a time. One or more bits constitute a picture element, or “pixel”; there is one pixel for each X position.
Because these video output circuits are so simple, the combination of a quite general page element (at the level of the generating software) such as a green-filled rectangle partially obscured by a red circle is far too complex to be handled in real time during actual display or output. The page elements must first be converted into the much simpler scanline raster structure described above, and any Z-order overlaps must be resolved prior to final output.
By far the most common solution to this problem is software known as a rasterizer, also known as a “RIP” (Raster Image Processor), which simply allocates a full page (the “frame buffer”) of raster memory (or in a more limited case a strip or “band”) and then draws (“rasterizes”) each page element into the raster memory. A computer monitor typically displays in real time so the viewer sees the drawing of each element apparently instantaneously, but for a printer the full page description (all the page elements) must arrive before output can start. This is because even the very last page element might draw something on the first line, and once the printing process starts, it normally starts from the top down and cannot reverse. Ink or toner deposited on paper cannot be erased.
The process of rasterizing takes a page element and “scan converts” the element into the horizontal and vertical arrays of pixels that visually correspond to the element. For example, a green rectangle 50 lines high and 75 pixels across will put 3,750 (50×75) green pixels into the raster structure, 75 pixels on each of 50 lines, at the position in the raster corresponding to the page position of the element.
If a subsequent page element, say a red circle, appears and overlaps part or all of the green rectangle, because the red circle appears later (higher in the Z order) then the red pixels replace (typically) the green pixels exactly where the red circle overlaps the green rectangle.
The problem of resolving overlaps in a conventional rasterizer is solved by large amounts of computer RAM (in which each pixel is randomly and directly accessible) with direct memory overwrites of pixels that are obscured.
This scheme is entirely general and is well-suited to handling millions or even billions of page elements appearing in random positions. The amount of frame buffer RAM remains constant irrespective of the number of page elements.
Even in the case of poor programming in which an identical page element is mistakenly sent millions of times, there is no memory cost. Each element is drawn (possibly over previous ones) and then the element is discarded. There is only the cost of CPU time.
Once all the page elements have been rasterized, this now greatly simplified raster data is ready to be sent to the output device.
Even for medium resolution monitors (1024×768 is now common), modern computers use graphics “accelerator” chips to speed up this process of rasterization. When dealing with a printed page, however, current quality requirements demand at least 600×600 pixels per inch (also known as DPI or Dots Per Inch), which is 5,100×6,600 pixels for an 8.5″×11″ area on paper. For the best quality, rasterizers for printers process “continuous tone” or “contone” pixels, typically taken to mean 8 bits per primary color. For a CMYK (color) printer then, each pixel is 32 bits, or four 8-bit bytes. A full page raster for US Letter paper is 5100×6600×4=134,640,000 bytes (approximately 128 MB).
Even by today's standards, this is a large amount of computer RAM, particularly on a computer with other concurrent processes and a large operating system (e.g., Microsoft Windows). And, if the quality required doubles in each of X and Y to 1200×1200 DPI, as with newer printers, then the RAM required for the frame buffer quadruples. A full frame buffer for US Letter paper for 32-bit CMYK contone at 1200×1200 DPI requires 538,560,000 bytes (over ½ gigabyte).
Modern CPUs are dramatically faster internally than the RAM can operate. The bridge between the very fast CPUs and the slow RAM are the CPU caches, known as L1 (fastest) and L2 (fast) for Level 1 and Level 2 caches. Access to slow RAM is through the much smaller caches. A Pentium 4 CPU for example typically has just 8 KB L1 and 256 KB L2 caches.
The link to the rasterizing problem is this: because the page elements can appear randomly, the rasterizer puts the elements' pixels into the RAM as it receives them, and there is potential for cache “thrashing”. As page elements appear all over the page, a particular region of slow RAM might be brought in to the CPU cache repeatedly during the page rasterization, having been displaced repeatedly by different RAM regions. Conceivably the same slow RAM areas could be brought into the cache thousands of times per page. Because of the cache architectures (with cache lines), modern CPUs access memory more efficiently using sequential access compared to random access.
So, one big performance problem with conventional rasterizers is that they consume too much slow RAM.
But this problem is just the beginning. The bulk of the frame buffer causes further big problems “downstream” since this huge amount of raster data must be both processed and then delivered eventually to the simple print engine video circuits.
The second big problem, derived from the first, is that in high-quality print systems further image-processing steps are required (color matching, trapping, halftoning, etc.). In typical simple systems, these post-rasterization operations are applied to each pixel in the raster. If some of these operations are neighborhood operations (which examine and adjust each pixel in the context of its immediate neighbors), the CPU cost can be enormous: many times longer than the rasterization process itself.
A third big problem then arises from the bulky “raw” raster. The print engine circuits may be in a device separate from the raster frame buffer. This print architecture is known as “raster” or “host-based”, or incorrectly as “GDI” (taken from Microsoft Windows Graphical Device Interface), where the user (or client) PC performs the rasterizing and hosts the raster frame buffer.
A raster printer connected to a PC via USBI can accept at best about 1 million bytes per second. Taking our 1200×1200×32 Letter example where the raster frame buffer is over 500 million bytes, that single page would take over 500 seconds just to transmit (over 8 minutes). The 600×600×32 Letter page would take over 2 minutes to transfer. (Printers connected via Ethernet have widely varying bandwidth, but sometimes perform even worse than USB.) And, a laser printer would need an equivalent and expensive amount of RAM to store the entire frame buffer since once the laser engine starts it cannot stop, and any delay in the arrival of the data after engine start would mean an incomplete print.
Since the printer's manufacturer would be unable to sell such a slow and expensive printer, the conventional solution to this problem is compression of the raster. Once the final raster is complete, a post-rasterizing compression step is applied to reduce the number of bytes that must be transmitted.
But, as is well known in the art, there is a fundamental trade-off with compression methods: the greater the compression, the greater the CPU time cost. So, with these very large raster frame buffers, to effectively compress the data often takes more CPU time than to rasterize in the first place.
Further straining this bloated, inefficient, slow system is an optional quality requirement that each page element be handled differently by class of element. For example, a different halftone screen might be applied to text as compared to the screen for bitmaps (photos).
This implies, for the conventional rasterizer, that even more RAM must be allocated per pixel. The page element type (text, graphics, image, etc.) must be recorded in the frame buffer as well as the color information. The CMYK contone system described earlier would need to expand typically by at least another byte per pixel: 5 bytes per pixel. Another problem immediately presents itself: this odd number is awkward for modern computers. Filling and changing 5 bytes (40 bits) is a mismatch for 32-bit computers. So, the extra page element information (also known as “metadata” or “tag” bits) is typically allocated alongside the RAM in a parallel array of bytes. Now each logical row in the page has two RAM arrays: one of 32-bit CMYK pixels, and another of 8-bit metadata bytes. These two separate arrays now are even more likely to “thrash” the cache.
A final, fatal blow to the conventional rasterizer is an even higher quality requirement on color pixel precision. Microsoft has announced 128-bit per pixel color for its next major Windows release (code-named “Longhorn”). If the rasterizer must handle 128-bit pixels for optimum quality (i.e., no loss from the digital source), then the amount of RAM required is quadrupled again. In our 1200×1200 DPI case, the raw raster frame buffer for US Letter at 128-bits/pixel is over two billion bytes.
In the prior art, there are numerous attempts to alleviate the bulk and slowness inherent in simple rasterizers. None solve the original problem without introducing new major problems.
The main ideas in the prior art to reduce the RAM requirement involve either “banding” or “display lists”, or both.
In the “banding” case, instead of allocating the entire frame buffer, typically a strip or band is allocated. If the page is 6000 lines, then a 600 line band would mean only 1/10 the RAM. What's wrong with this? There are two main drawbacks. First, the entire set of page elements must be recorded (into a “display list”) to play back (rasterize) for each band on the page. Since the band buffer is now a mere fraction of the full page description, the display list must be complete. Second, elements that span more than one band must be processed more than once.
The display list problem is small for simple pages. A system making the display list for the text on this page wouldn't need much storage for the list. But for complex pages, the display list storage problem is crippling or fatal. A computer program generating polygons for an artificial scene might generate billions of polygons. All of these must be recorded in the display list before the banding rasterizer can begin. While this is an extreme case, every display list/banding scheme must handle the case where it runs out of resources (RAM, disk space, etc.) in which to record the display list. Upon overflow, some of the prior art attempts to “fallback” to raw raster at a lower quality, but the fundamental problem remains: there will be pages that either outright fail to print, or at least fail to print at the normal quality level.
The second performance problem with banding, though less severe, may still impair performance dramatically. Page elements that span multiple bands must be accessed and processed for each band in which they appear. The simplest method is to draw each element, repeatedly and fully, for each band, discarding any lines which are above or below the current banding buffer. Generating then discarding that raster data wastes CPU time. While it is sometimes possible to “clip” the element cleanly at band boundaries, often this is not entirely precise and is inefficient again due to wasted results.
A third method well known in the prior art is the use of sorted “active” object lists with the banding buffer as little as one line. While this can cut down on redundant pixel drawing, it retains the problem of display list overflow, and adds the problem of object sorting.
A fourth method involves a display list that creates per line a sorted list of active runs which are then “flattened” prior to output. This method retains the display list overflow problem, and adds the problem of potentially huge active object and active runs lists.
Accordingly, there is a need in the art for a system and method to produce the raster needed for output displays and print engines without either allocating huge amounts of slow RAM, or creating potentially huge slow display lists.