1. Field of the Invention
This invention relates to the field of graphics processing.
Portions of the disclosure of this patent document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever.
2. Background
Computer systems are often used to display generate and display graphics on a display. Display images are made up of thousands of tiny dots, where each dot is one of thousands or millions of colors. These dots are known as picture elements, or “pixels”. Each pixel has a color, with the color of each pixel being represented by a number value stored in the computer system.
A three dimensional display image, although displayed using a two dimensional array of pixels, may in fact be created by rendering of a plurality of graphical objects. Examples of graphical objects include points, lines, polygons, and three dimensional solid objects. Points, lines, and polygons represent rendering “primitives” which are the basis for most rendering instructions. More complex structures, such as three dimensional objects, are formed from a combination or mesh of such primitives. To display a particular scene, the visible primitives associated with the scene are drawn individually by determining those pixels that fall within the edges of the primitive, and obtaining the attributes of the primitive that correspond to each of those pixels. The obtained attributes are used to determine the displayed color values of applicable pixels.
Sometimes, a three dimensional display image is formed from overlapping primitives or surfaces. A blending function based on an opacity value associated with each pixel of each primitive is used to blend the colors of overlapping surfaces or layers when the top surface is not completely opaque. The final displayed color of an individual pixel may thus be a blend of colors from multiple surfaces or layers.
In some cases, graphical data is rendered by executing instructions from an application that is drawing data to a display. During image rendering, three dimensional data is processed into a two dimensional image suitable for display. The three dimensional image data represents attributes such as color, opacity, texture, depth, and perspective information. The draw commands from a program drawing to the display may include, for example, X and Y coordinates for the vertices of the primitive, as well as some attribute parameters for the primitive, and a drawing command. The execution of drawing commands to generate a display image is known as graphics processing.
The prior art has provided two solutions to accomplish graphics processing. One solution is to build special processing hardware to provide high speed graphics processing capability. The other is to provide programmable graphics processing by executing graphics processing software on a general purpose processing platform. Both prior art solutions have drawbacks that limit their flexibility and performance.
Hardware solutions typically provide special purpose hardware that implements hardwired graphics processing algorithms that can provide very fast processing capabilities. However, the design and debugging of hardware is a complex, expensive, and time consuming process. Hardware solutions are also inflexible. Should a new algorithm become known, the only way to implement it is to build a new hardware product. Thus, the hardware solution lacks the flexibility needed to respond to changing conditions. In addition, hardware solutions generally are only available to provide processing capability to graphics processing tasks. For non graphics processing tasks, additional processing capabilities are required, adding to the expense of a graphics processing system.
Prior art software solutions provide a programming language that can be executed on a general purpose processing system. Rendering commands from a program drawing to the display are interpreted and executed in software. Software solutions are more flexible that hardware solutions in that new algorithms and techniques can be implemented by writing new software, which is easier than designing and building new hardware. However, existing software solutions also suffer from a number of disadvantages.
One prior art software problem is instruction execution latency. Many graphics algorithms consists of a small number of reduced instruction set computing (RISC) single instruction, multiple data (SIMD) instructions (a few dozen instructions or less). The instructions may be independent or dependent. A dependent instruction is an instruction that contains operand dependencies, that is the source operands of one instruction are the result operands of a prior instruction. For example, a typical quadratic polynomial d=ax**2+bx+c might be coded ase=x*a+b d=x*e+c 
Since the result (e) of the first instruction is a source (e) in the second instruction, the second instruction cannot be executed until completion of the first instruction, creating an operand dependency. If the pipeline latency is, for example, 7 clocks, the second instruction could not begin until 7 clocks had transpired, which is very inefficient (15% of peak throughput).
There are several approaches to alleviating this problem. One is the scheduling of independent instructions between the dependent instructions, so that processing continues even when a dependent instruction is waiting for data. This requires optimizing of the execution of the program by a programmer or by an optimizing compiler. Requiring the programmer to optimize during development is a complex task. An optimizing compiler is difficult to write and may not always optimize.
Another problem is that with many short graphics programs there are often not enough independent instructions to schedule. If there are no independent instructions to execute, the processor is idle, reducing efficiency.
Another attempt to optimize software execution of graphics commands is to interleave instructions from multiple vertices and pixels, such as different loop passes. This is even more complex than prescheduling, and leads to inefficiencies when modes change.
Another approach is to shorten the pipeline latency, or pipeline bypassing (so that operands still in the pipeline can be used by other instructions before being written back to the registers). Both of these solutions require complex hardware to control and route all the operands in the pipeline, and is of benefit only when operations can execute in a single clock cycle. Operations that take multiple clocks, such as floating point multiply and add, are not optimized since the partial results within them cannot be bypassed to other instructions.
Software solutions also suffer from conditional execution. When a program executes a conditional branch dependent on the results of a previous instruction, the next instruction cannot begin execution until the result is computed, inefficiently waiting up to the depth of the pipeline. There are several approaches to alleviating this problem. Results in the pipeline can be bypassed to the branch control, or the pipeline depth can be shortened, with similar hardware complexities to operand dependency above. Another approach is branch delay slots, in which several instructions following the branch instruction are executed, which leads to software complexity in scheduling those instructions, especially in short graphics programs.
Another approach is speculative execution and/or branch prediction of instructions after the branch and/or at the branch target, executing the branch whether needed or not. This leads to inefficiencies when the speculative or predicted instructions are not needed.