The present invention relates in general to graphics processors, and in particular to real-time display post-processing using programmable hardware.
Computer-based image rendering usually begins with a geometric representation of a scene. Various objects are described as collections of “primitives” (usually simple polygons such as triangles, as well as points and lines) that can be placed in a scene. A viewing coordinate system is chosen, and the primitives are transformed into that system. The primitives are then converted to a two-dimensional (2-D) “fragment” array representation, where each fragment has a color and may have other attributes such as a depth coordinate or surface normal. Lighting, textures, fog and various other effects that enhance visual realism may be introduced at the primitive and/or fragment stages. At the end of the rendering process, data for each fragment (generally at least the color value) is stored in an image buffer. The image buffer is read out by a “scanout” process that operates isochronously to deliver pixels to a display device at a prescribed screen refresh rate. Real-time animation requires that the rendering process deliver new images at a rate of around 30 Hz. Typical display devices operate at screen refresh rates of around 60-80 Hz.
To meet these processing rate requirements, many computer systems include a dedicated graphics co-processor that performs rendering operations on data provided by the central processing unit (CPU) and also performs the isochronous scanout operation to drive the display device. Typical graphics processors include a rendering object and a scanout engine that operate asynchronously with each other. The rendering object generates fragment data for a new image in a “back” image buffer while the scanout engine drives the display using a previously rendered image in a “front” image buffer. When rendering of the new image is complete, the “back” and “front” buffers are switched, so that the scanout engine begins displaying the newly rendered image while the rendering object moves on to the next image. In general, the scanout engine may read the same image two or three times before rendering of the next image is complete.
The rendering object and the scanout engine are usually very different in implementation. The rendering object is generally flexible and programmable. Typical rendering objects include an execution core (or a number of parallel execution cores) with functional units that can be instructed to execute an arbitrary sequence of operations. With suitable programming, the execution core can be made to execute any combination of rendering algorithms to generate a particular image, and the algorithms can be varied as desired.
The scanout engine, in contrast, usually has limited processing capability and is not programmable. Instead, the scanout engine has a sequence of pipelined special-purpose processing circuits through which the fragment data flows, with the processing circuits performing various operations to transform the fragment data to pixel values. For example, some scanout engines support adding overlays (e.g., cursors or video overlays) that may update at higher rates than the rendered image; color correction (e.g., gamma correction to account for nonlinearity in the display response); or filtering of the fragment data to match the number of pixels on the screen (e.g., for antialiasing). The special-purpose circuits are generally designed to operate with fixed latency, to ensure that pixel data is delivered isochronously to the display device.
In some processors, it is possible to enable or disable various scanout-time operations (e.g., overlays can be turned on or off) or to change parameters of the operations (e.g., parameters for gamma correction or the position of an overlay). But because each operation is implemented in a different special-purpose circuit, it is generally not possible to add a new operation to the pipeline, to change the sequence of operations, or to change the algorithm implementing a particular operation without building a different scanout engine. Thus, the ability to reconfigure a scanout engine is very limited. Adding a new feature generally requires changes to the circuit, which can affect chip area and scheduling, adding cost and delay.
As real-time rendering techniques continue to advance, a more powerful and flexible scanout engine that can add a variety of effects at the display rate has become increasingly desirable. In addition, the range of display devices that can be driven by graphics processors has increased; besides conventional CRT monitors, graphics processing devices can be used to drive LCD monitors, digital micromirror projectors, plasma monitors, and so on. Each type of display device has different requirements for driving its pixels, and it is difficult to accommodate all of these requirements in a single hardware pipeline. Greater flexibility in the fragment-to-pixel conversion process is thus highly desirable.
It would therefore be desirable to provide a graphics processor with the ability to execute arbitrary sequences of operations at display cadence.