The present invention pertains to the field of computer processing. More specifically, the present invention pertains to instructions utilized by integrated circuits for processing of data, such as three-dimensional graphics geometry processing.
Computer-generated graphics design generally consists of instructions implemented via a graphics program on a computer system. The instructions are recognized by the computer system""s processor and so direct the processor to perform the specific calculations and operations needed to produce three-dimensional displays. The set of instructions recognized by the processor constitute the instruction set of that processor.
Computer-generated graphics design can be envisioned as a pipeline through which data pass, where the data are used to define the image to be produced and displayed. At various points along the pipeline, various calculations and operations are specified by the graphics designer, and the data are modified accordingly.
In the initial stages of the pipeline, the desired image is framed using geometric shapes such as lines and polygons, referred to in the art as xe2x80x9cprimitivesxe2x80x9d or xe2x80x9cgraphics primitives.xe2x80x9d The derivation of the vertices for an image and the manipulation of the vertices to provide animation entail performing numerous geometric calculations in order to project the three-dimensional world being designed to a position in the two-dimensional world of the display screen.
Primitives are then assembled into xe2x80x9cfragments,xe2x80x9d and these fragments are assigned attributes such as color, perspective, and texture. In order to enhance the quality of the image, effects such as lighting, fog, and shading are added, and anti-aliasing and blending functions are used to give the image a smoother and more realistic appearance. In the final stage, the fragments and their associated attributes are combined and stored in the framebuffer as pixels. The pixel values are read from the framebuffer and used to draw images on the computer screen.
The processes pertaining to assigning colors, depth, texturing, lighting, etc., (e.g., creating images) are collectively known as rendering. The specific process of determining pixel values from input geometric primitives is known as rasterization.
The graphics design process is implemented in the prior art utilizing a computer system architecture that includes a geometry engine and a rasterization engine that are coupled in series to form the graphics pipeline through which the data pass. The geometry engine is a processor for executing the initial stages of the graphics design process described above. The rasterization engine is a separate processor for executing the processes above collectively identified as rasterization. Because the geometry engine precedes the rasterization engine in the graphics pipeline, the rate at which the rasterization engine can process data is limited by the rate at which the geometry engine can perform its calculations and forward the results to the rasterization engine. Thus, it is desirable to have a geometry engine capable of performing calculations at speeds that match the speed of the rasterization engine so that the geometry engine does not become a bottleneck in the graphics pipeline.
However, a problem with the prior art is that state-of-the-art rasterization engines are faster than comparable geometry engines, and so the geometry engine has become a limiting component in the graphics pipeline. Consequently, the speed at which the graphics process can be executed is slower than what could be achieved with an improved geometry engine, thus limiting the complexity of scenes which can be rendered.
One prior art solution to the above problem entails designing and implementing complex hardware dedicated to geometry calculations for computer-generated graphics, i.e., dedicated geometry engine hardware such as a dedicated processor. A problem with this prior art solution is that such dedicated hardware can be expensive. Another problem with this solution is that the dedicated hardware can typically only be used on those computer systems specifically designed for that hardware. Moreover, such specialized, dedicated hardware in the form of a dedicated processor typically utilizes an instruction set for which no compilers are available. Hence, all programming must often be done at the assembly or machine-language level. Such low-level languages are machine-dependent and therefore require knowledge of the specific processor. As such, dedicated processors offer somewhat narrow and cumbersome solutions to problems such as improved geometry processing.
Another problem with the dedicated geometry engine hardware is the explicit synchronization mechanisms that need to be implemented in the hardware and the software that use this hardware. Synchronization is needed to communicate the begin and completion points of the computation being done on the dedicated hardware.
Another prior art solution is to perform geometry calculations using the instruction set of a general purpose processor (instead of the dedicated processor discussed above). A general purpose processor, as the term is used herein, has an instruction set partly or wholly supported by a compiler and is therefore programmable to some degree using high-level languages (i.e., machine-independent languages such as C and Pascal). Such languages are easier to program than the low-level languages of the dedicated processor described above. Although portions of a general purpose instruction set may be unsupported by a compiler, advantages are still achieved through the ease with which assembly code may be linked to compiled code during the programming process. Although a general purpose processor is designed for a variety of applications, its actual use can be narrow. Additionally, to the extent a general purpose processor in a given application supports other tasks in addition to geometry calculations, then synchronization between the geometry calculations and these other tasks is implicitly resolved through processor programming.
A problem with this solution, however, is that many instruction sets are not powerful enough to quickly perform the complex calculations required for computer-generated graphics. Thus, the prior art is problematic because it typically takes several instructions to specify and perform an operation or function. In general, the more instructions specified, the longer it takes to perform the operation or function. Thus, geometry calculations are slowed by the number of instructions used in the prior art. It is therefore desirable to reduce the number of instructions, thereby increasing the speed at which a geometry engine can perform geometry calculations.
Accordingly, what is desired is a system and/or method that can increase the speed at which a processor (and, preferably, a general purpose processor) is able to perform geometry calculations for the graphics design process. What is further desired is a system and/or method that can accomplish the above and can also provide a cost-effective solution that can be implemented in computer systems using various types of processors and processor cores. The present invention provides a novel solution to the foregoing.
These and other advantages of the present invention will become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments which are illustrated in the various drawing figures.
In accordance with the present invention, a system and method of same are provided that can increase the speed at which a processor is able to perform various operations including geometry calculations for a graphics design process. This system and method can accomplish the above and can also be a cost-effective solution that can be implemented in computer systems using various types of processors and processor cores. This system and method can reduce the number of instructions needed to specify and perform a given operation (e.g., geometry) and thereby facilitate an increase in the speed at which a processor operates.
In accordance with a preferred embodiment of the present invention, an application specific extension to a general purpose instruction set architecture is provided that incorporates high performance floating point operations designed to improve the performance of three-dimensional graphics geometry processing on a general purpose processor. Instructions included in the extension can use a variety of data formats including single precision, double precision and paired-single data formats. The paired-single format provides two simultaneous operations on a pair of operands. The instructions included in the extension may also be used in situations unrelated to three-dimensional graphics processing. Additionally, in an alternative embodiment, these instructions may be defined as part of the instruction set architecture itself rather than an extension to such architecture. These instructions may be carried out in hardware, software, or a combination of hardware and software.
The extension to the instruction set architecture can reduce the number of instructions needed to perform geometry calculations. As a result, a processor may be capable of performing geometry calculations at speeds approaching the speed of the rasterization engine, so that the processor is less likely to become a bottleneck in the graphics pipeline.
In one embodiment, the extension to the instruction set architecture is implemented as a set of floating point instructions that function with a MIPS-based instruction set architecture. In this embodiment, a processor comprising a floating point unit performs geometry calculations by executing the floating point instructions.
In one embodiment, a vertex in a computer graphics image is represented with world coordinates. The world coordinates are transformed into transformed world coordinates. In one embodiment, the transform includes using a floating point reduction add instruction. In one aspect of this embodiment, the floating point reduction add instruction is an ADDR instruction.
In one embodiment, a floating point reciprocal instruction processes a plurality of operands. In this embodiment, reduced precision reciprocal values are calculated with values from a plurality of lookup tables configured in parallel. In one aspect of this embodiment, the reduced precision floating point reciprocal instruction is a RECIP1 instruction. In one embodiment, perspective division is performed on the transformed world coordinates using the floating point reciprocal instruction.
In one embodiment, full precision reciprocal values are calculated using the reduced precision reciprocal values.
In another embodiment, a vertex in a computer graphics image is represented with surface normal coordinates. The surface normal coordinates are transformed into transformed surface normal coordinates. In one embodiment, the transform includes using a floating point reduction add instruction. In one aspect of this embodiment, the floating point reduction add instruction is an ADDR instruction.
In one embodiment, a reciprocal square root instruction processes a plurality of operands. In this embodiment, reduced precision reciprocal square root values are calculated with values from a plurality of lookup tables configured in parallel. In one aspect of this embodiment, the reduced precision floating point reciprocal square root instruction is a RSQRT1 instruction. In one embodiment, the transformed surface normal coordinates are normalized using a floating point reciprocal square root instruction.
In one embodiment, full precision reciprocal square root values are calculated using the reduced precision reciprocal square root values.
In one embodiment, a dot product between the transformed surface normal coordinates and a vector is computed using the floating point reduction add instruction.
In one embodiment, a processor including a memory and an execution unit (xe2x80x9cEUxe2x80x9d) determines reduced precision values from a plurality of operands. A first instruction formatted to operate on the plurality of operands in parallel is stored in the memory. The first instruction is dispatched for execution by the EU. Executing the first instruction in the EU includes: i) accessing in parallel a plurality of lookup tables in the EU to obtain a plurality of first intermediate results, where each lookup table is accessed with a first portion of a corresponding operand; ii) modifying in parallel a second portion of each of the plurality of operands to obtain a plurality of second intermediate results; and iii) arithmetically combining in parallel the plurality of first intermediate results with the plurality of second intermediate results to obtain the plurality of reduced precision values. In one aspect of this embodiment, the reduced precision values are reciprocal values; in another aspect of this embodiment, the reduced precision values are reciprocal square root values.
In one embodiment, a computer program product includes a computer-readable medium having a plurality of instructions stored thereon. A first instruction enables a processor to combine a first operand and second operand from a first data set and place the result in a first position of a destination data set. This instruction also enables the processor to combine a third operand and a fourth operand from a second data set and place the result in a second position of the destination data set. A second instruction enables the processor to determine a plurality of reduced precision values from a second plurality of operands.