The present invention relates to computer graphics systems, and more particularly to a flexible and efficient method and system for performing processing for computer graphics systems.
A conventional computer graphics system can display a graphical image on a display. The graphical image typically includes a plurality of objects. Often, for three-dimensional graphics applications, these objects are three-dimensional objects which are to be displayed on a two-dimensional display. Prior to rendering the objects to the display, data for the objects are processed. For example, each of the objects is typically represented as a plurality of vertices. Each of the vertices is given three-dimensional coordinates in a particular space. The data are typically transformed to a different coordinate system at least once during processing of the data. An object is typically defined by assigning the vertices of the object to coordinates in object space. Coordinates in object space are based on their position with respect to a particular single vertex on the object. Thus, each vertex is assigned coordinates in object space using the position of the vertex relative to a particular point on the object. Typically, a vertex can be represented by a four-dimensional vector including three coordinates of the object with respect to a particular vertex of the object and the coordinates of the particular vertex. Thus, each vertex can be considered to be a vector. In order to place the object in the graphical image, or world space; the data for the vertices are transformed from object space to the world space of the graphical image. The data may also be transformed from world space to viewer space, which is defined with respect to a view plane, for example a screen of the display. Lighting for the graphical image is also generally processed prior to rendering the graphical image, typically while the object is in world space coordinates. Lighting for a particular portion of the object depends on the position of the object in world space and the position of one or more light sources in the graphical image. Depending on these factors, lighting for a portion of an object can be determined.
Conventional computer graphics systems use different conventional mechanisms in order to process data for objects, including transformations between coordinate systems and determinations of lighting values. Some conventional computer graphics systems simply rely on a general-purpose central processing unit (xe2x80x9cCPUxe2x80x9d) of the computer system to perform the data processing: This allows normal programming languages to be used for instructing the computer system on how to process the data.
Although a conventional general-purpose CPU can be used, one of ordinary skill in the art will readily realize that only the standard mathematical operations and hardware generally available for the conventional general-purpose CPU can be used in processing the data for the objects. For example, the standard floating point unit, which is capable of a multiply or a multiply add per clock cycle, is used. Because vertices are typically represented as a vector, matrix operations are used to transform vertices from one coordinate system to another. Conventional general-purpose CPUs perform operations one at a time. Transforming each vertex requires a matrix multiplication for the vector representing the vertex. Conventional general-purpose CPU can only execute a single operation, i.e. one multiply or one multiply-add operation, per clock cycle. Consequently, a matrix may typically take at least sixteen to thirty-two clocks to finish a transformation. Moreover, a general purpose CPU will also typically require many load operations to move the data from system memory into the CPU""s registers. Thus, the transformation may take even longer. The general purpose CPU will then have to store the data back to memory. Considering that most CPU""s may incur a cache miss penalty when fetching from system memory, a matrix multiply could take as long as several hundred cycles. Transformations are also performed multiple times, once for each of each vertex. Performing the transformations for an object will thus consume many clock cycles and be relatively slow.
Furthermore, the conventional general-purpose CPU is typically not optimized for many operations used in three-dimensional graphics. For example, division operations are generally not critical operations. However, division operations are frequently used in three-dimensional graphics. Similarly, three-dimensional graphics often performs the same operations, such as a transformation, many times, on thousands of vertices of an object. A conventional general-purpose CPU, on the other hand, is typically optimized for performing different operations on different data. The hardware of a conventional general-purpose CPU is also not optimized for the operations generally performed for three-dimensional graphics. Thus, there is a great deal of hardware in the conventional general-purpose CPU that is not used during processing of graphics data. Furthermore, the data transfers typically used in three-dimensional graphics are not optimized in conventional general-purpose CPUs. Three-dimensional graphics typically reads in data for a vertex, multiplies the data by a transformation matrix, outputs the transformed data, and repeats this process for the remaining vertices to be transformed. This type of movement of data is not optimized in a conventional general-purpose.CPU. Thus, a conventional general-purpose CPU is thus less efficient at performing operations used in three-dimensional graphics. Furthermore, the conventional general-purpose CPU performs tasks other than processing of graphics data. Consequently, the resources of the conventional general-purpose CPU may be consumed with other tasks, slowing processing of data. Thus, although flexible, conventional general-purpose CPUs are inefficient at processing graphics data for a variety of reasons.
A second conventional method for processing graphics data is to use dedicated hardware, such as application specific integrated circuits (xe2x80x9cASICsxe2x80x9d). Conventional ASICs are typically built to do a single operation, such as a matrix multiplication for a transformation or a determination of lighting values. Data is provided to the conventional ASIC, operated on, an output. Because a conventional ASIC is optimized for its function, the conventional ASIC does not consume extra space and is fast. For example, an ASIC which performs transformations is programmed with a matrix that is to be used in transforming the data. Data for a vertex is provided to the conventional ASIC, multiplied by the programmed matrix and output. This operation is performed relatively efficiently.
In order to process data for the graphical image, ASICs which have different functions are coupled serially. For example in one conventional system, a first conventional ASIC may be for performing a first transformation. A second conventional ASIC is for determining lighting values. A third conventional ASIC is for performing a second transformation. A set of conventional ASICs is for performing clipping to ensure that only the appropriate portion of the world for the graphical image is provided to the display. Data for a vertex is provided to the first conventional ASIC and transformed. The transformed data is provided to the second conventional ASIC, where lighting values for the vertex are calculated. The data for the vertex is again transformed by the third conventional ASIC. The data may then be clipped by the set of conventional ASICs. This is repeated for each of the vertices of each object being processed. Consequently, data for a graphical image can be processed, then rendered.
Although conventional ASICs can process graphical data rapidly, one of ordinary skill in the art will readily recognize that the conventional ASICs are not flexible. Because the function of a conventional ASIC is determined by the hardware, the function cannot be altered. Thus, an ASIC which is built for performing transformations may be incapable of determining lighting values or performing other functions. If different functions are desired, new conventional ASICs must be provided. Consequently, a conventional computer graphics system that relies on conventional ASICs to process data is inflexible.
Another conventional method for processing graphics data prior to rendering utilizes a conventional special-purpose CPU. The conventional special-purpose CPU is optimized for performing transformations, determination of lighting values and other tasks used in processing three-dimensional graphics data. The conventional special-purpose CPU is highly customized for processing graphics data. In order to program the conventional special-purpose CPU, proprietary microcode from the manufacturer must be used. Microcode controls the operation of the conventional special-purpose CPU at a very fine level. Using microcode, tasks performed by the conventional special-purpose CPU can be changed. Thus, the conventional special-purpose CPU is relatively flexible and, because the conventional special-purpose CPU is optimized for processing three-dimensional graphics data, relatively efficient.
Although the conventional special-purpose CPU can be used in processing data for a graphical image, one of ordinary skill in the art will readily realize that there are drawbacks to using the conventional special-purpose CPU. The conventional special-purpose CPU is not readily extensible to other systems because the microcode specified by the manufacturer is proprietary. Furthermore, microcode for the conventional special-purpose CPU is generally very specific. For example, when instructing the conventional special-purpose CPU to perform an add operation on two numbers, a programmer would have to provide instructions for obtaining the numbers from specific registers, transferring the data to the adder and selecting a particular function, addition, from the adder. The developer must also specifically account for special cases that cannot be treated using the specialized instructions for the general case. Thus, development of code is made more difficult because the developer must control the operation of the conventional special-purpose CPU at a much finer level and be aware of exactly how the conventional special-purpose CPU functions. Furthermore, although the conventional special-purpose CPU is typically more efficient, the speed at which the conventional special-purpose CPU operates depends upon the implementation of the microcode. Consequently, the speed of the conventional special-purpose CPU may not be significantly improved. Thus, the flexibility, speed and ease of exploiting the conventional special-purpose CPU may be limited.
Accordingly, what is needed is a flexible, efficient system and method for processing graphics data for objects, such as three-dimensional objects. The present invention addresses such a need.
The present invention provides a method and system for processing graphics data in a computer system. The method and system including providing a general-purpose processor and providing a vector co-processor coupled with the general-purpose processor. The general-purpose processor includes an instruction queue for holding a plurality of instructions. The vector co-processor is for processing at least a portion of the graphics data using a portion of the plurality of instructions. The vector co-processor is capable of performing a plurality of mathematical operations in parallel. The plurality of instructions is provided using software written in a general-purpose programming language.
According to the system and method disclosed herein, the present invention provides a system for processing graphics data, particularly three-dimensional graphics, which is efficient and flexible.