1. Field of the Invention
The present invention generally relates to multi-threaded computer architectures and, more specifically, to programmable blending in multi-threaded processing units.
2. Description of the Related Art
A number of important graphics rendering standards known in the art as OpenVG, SVG, Cairo, Skia, JavaFX, Adobe Flash, Adobe PDF, Apple's Quartz 2D, and HTML5 require complex blend modes that are not directly supported by conventional graphics processing units (GPUs). As a consequence, graphics content formatted according to these graphics rendering standards is conventionally rendered by a central processing unit (CPU), which is able to implement the complex blend modes using general processing operations. However, the CPU does not provide efficient, high-throughput processing compared to that of a GPU, potentially resulting in comparatively low CPU-based rendering performance for the graphics content.
Conventional GPUs organize graphics rendering work as a series of graphics objects that are each decomposed into a series of fragments, which are then transmitted to a fragment shader. The fragment shader computes a color for each fragment and generates a corresponding shaded fragment, which typically includes color and opacity information. Each shaded fragment is then transmitted to a color raster operations (CROP) unit, which is configured to blend the shaded fragment with color data for a corresponding pixel stored in a frame buffer. The CROP unit conventionally performs this blend operation using a fixed-function sum of two products circuit that does not directly implement the complex blend modes.
One approach to performing complex blend modes needed by the graphics rendering standards within a GPU is to program a fragment shader to implement the complex blend modes. While a fragment shader is highly programmable and able to execute the complex blend modes, read and write latency from the fragment shader to frame buffer memory is sufficiently long in a conventional GPU that rendering performance is crippled and likely falls below that achievable on a contemporaneous CPU.
As the foregoing illustrates, what is needed in the art is a technique for efficiently rendering complex blend modes within a GPU.