The disclosed invention is related to graphics display technology, and more specifically to methods for blending colors in a graphics processing unit. Color blending is a process which combines color channels and alpha channels from sources of image information in order to generate modified image information. One such method has been described by Thomas Porter and Tom Duff in “Compositing Digital Images”, Computer Graphics, 18(3), July 1984, 253-259.
Blending is required in systems like computer-generated graphics, image fusion for visualisation, graphical user interfaces etc. and is usually part of a Graphics Processing Unit (GPU), i.e. of a device that can render graphic images to be displayed on computer screens.
Image blending was used from the start of motion picture generations (U.S. Pat. No. 1,262,954). Blending was part of computer-based image processing since its origins (U.S. Pat. Nos. 4,384,338, 4,679,040, 4,827,344).
Original blender implementations were based on image multiplexing at the output to a screen via analog circuitry or on software programmes running on standard processors. This method is suitable for applications where high-speed software processing resources are available or where there is no high-speed requirement for the generation of the output images, as is the case with photograph editing.
In order to be able to process blending in real time systems, a hardware blender is required. Methods that implement blending in hardware have been proposed as described in the following paragraphs:
One of the first architectures of a blending apparatus was suggested in U.S. Pat. No. 5,592,196. This apparatus includes instructions for implementing the blending functions. These instructions are included in tables which form a blending mode, making the method fast but not as flexible as a full programmable approach.
A hardware implementation of blending targeted explicitly to 3D graphics has been disclosed in U.S. Pat. No. 5,754,185. This method did not include any programmability mechanism but rather defined blending mode via control signals.
Another hardware implementation is described in U.S. Pat. No. 5,896,136. This description mentions a unit that implements blending equations by using an alpha channel of lower resolution than the RGB channels.
In a structure described in U.S. Pat. No. 7,397,479 a method for providing programmable combination of pixel characteristics is disclosed.
Methods for implementing programmable blending were disclosed with patent application US 2006/192788 and U.S. Pat. No. 7,973,797. In both cases, the instructions for blending are provided by a processing unit loading formula or operation descriptors as a sequence to be executed by the blending hardware.
Blending in the above referenced cases is defined as the process of generating a target pixel fragment value (T) by combining various inputs: a said source pixel fragment (S), a said destination pixel fragment (D) and corresponding alpha values (As, Ad) for the source and destination pixels. Depending on the blending mode a different function (f) is applied in order to calculate the target.
For calculating the target (T=f(S, As, D, Ad)), an arithmetic and logical unit (ALU) is employed that uses the said inputs and the blending mode in order to produce the target value. For many blending modes, computing the formula in a single operation requires complex hardware. In order to minimize hardware using simpler operators, the outputs can re-enter the ALU a second time or more until the formula is calculated.
During this iterative process the blender cannot receive new inputs, thus complex blending modes result in lower overall throughput of the GPU. One method to achieve higher throughput is to implement the ALU as a pipeline of at least two threads. If the input pixel fragments can be provided in a continuous flow, the pipeline can produce one output per each clock cycle.
The current state of the art in color blending devices as described above provides fast and programmable functionality. Many different operations—from a predefined set—can be performed on sequences of pixel fragments, where each pixel is represented as a color (c, usually R,G,B) and alpha (α) combination.
One shortcoming of current implementations is that they are best fit for systems where the locations of subsequent pixel fragments are more or less continuous. In a modern GPU system like the one shown in FIG. 2, Shader 206 processing and communication to the main memory for reading and writing pixel fragments is a bottleneck. Thus, the system cannot generate a steady flow of continuous pixel fragments.
Another limitation is that most current implementations operate on integer or fixed-point representations. This makes it harder to interface with floating-point pixel sources and frame buffers. Furthermore, this limits the dynamic range of color representation for each pixel fragment.
Another limitation of most current solutions is that the programmability is constrained by a few predefined operators. In one case only (U.S. Pat. No. 7,973,797), the operation is guided by two instructions which can be configured by other entities in the GPU. A more flexible approach is required for full programmability, where any sequence of instructions including flow control can be provided as input in the form of a small program for the blender core.
All existing implementations support the RGBA color scheme that is very common in computer graphics; each pixel fragment is represented by three color channels of Red, Green and Blue (RGB) and an Alpha channel (A). However, if one has to blend non-RGBA pixel fragments (for example pixels in YUVA representation commonly used in video and photography), there needs to be another step of color space conversion, consuming time and bandwidth.