The present invention relates generally to graphics processors and, more particularly, to sequencing code segments during the execution of a graphics pipeline procedure.
A general system that implements such a pipeline system is illustrated in Prior Art FIG. 1. In this system, data source 10 generates a stream of expanded vertices defining primitives. These vertices are passed one at a time, through the graphics pipeline system 12 via vertex memory 13 for storage purposes. Once the expanded vertices are received from the vertex memory 13 into the pipelined graphic system 12, the vertices are transformed and lit by a transformation module 14 and a lighting module 16, respectively, and further clipped and set-up for being rendered by a rasterizer 18, thus generating rendered primitives that are then displayed on display device 20.
During operation, the transform module 14 may be used to perform scaling, rotation, and projection of a set of three dimensional vertices from their local or model coordinates to the two dimensional window that will be used to display the rendered object. The lighting module 16 sets the color and appearance of a vertex based on various lighting schemes, light locations, ambient light levels, materials, and so forth. The rasterization module 18 rasterizes or renders vertices that have previously been transformed and/or lit. The rasterization module 18 renders the object to a rendering target which can be a display device or intermediate hardware or software structure that in turn moves the rendered data to a display device.
When the foregoing modules are called to perform the various associated operations on the vertices, program code is normally executed to carry out the operations. A sequencer, or program counter, sequentially runs the code. Such code normally includes a vast number of xe2x80x9cif, . . . then . . . xe2x80x9d commands in order to deal with the substantial number of possible xe2x80x9cmodesxe2x80x9d of operation.
For example, a xe2x80x9cskinning,xe2x80x9d lighting, or any other type of operation may be enabled or disabled for a particular vertex. In use, the xe2x80x9cif, . . . then . . . xe2x80x9d commands identify the enabled operations for executing the same during processing of the vertex. While this results in the correct execution of the appropriate operations, it does so slowly since every possible mode must be handled with an xe2x80x9cif, . . . then . . . xe2x80x9d command.
Further reducing the speed of the graphics-processing pipeline is the fact that only one vertex may be processed at a time in each of the transform and lighting modules 14 and 16. In some prior art systems, two or more pipelines have been employed in order to allow multiple vertex processing. However, this requires redundant architectures, which increases the cost of such implementations.
In any system that attempts to process multiple vertex data simultaneously, a question arises as to how to buffer the inputted raw data and the outputted processed data. If not handled in a properly organized manner, problems may arise with data being overwritten or lost, and reduced performance.
A method, apparatus and article of manufacture are provided for sequencing graphics processing in a transform or lighting operation. A plurality of mode bits is first received which are indicative of the status of a plurality of modes of process operations. Pluralities of addresses are then identified in memory based on the mode bits. Such addresses are then accessed in the memory for retrieving code segments which each are adapted to carry out the process operations in accordance with the status of the modes. The code segments are subsequently executed within a transform or lighting module for processing vertex data.
In one aspect of the present invention, the mode bits may be received from a software driver. Further, the mode bits may be converted into a control vector prior to identifying the addresses in the memory. Such control vector may include a string of bits each corresponding to one of the addresses.
In one embodiment of the present invention, a technique may be employed for processing multiple threads of vertex data in a graphics-processing module. Initially, a first code segment is accessed per a first program counter. Next, the first code segment is executed in the graphics processing module.
Since the graphics processing module often requires more than one clock cycle to complete the execution, a second code segment may be accessed per a second program counter. Further, the second code segment may be executed in the graphics processing module prior to the completion of the execution of the first code segment in the graphics processing module. In use, the graphics-processing module requires a predetermined number of cycles to generate an output, and the various steps of the present embodiment are repeated for every predetermined number of cycles.
In accordance with one aspect of the present embodiment, the graphics-processing module includes a multiplier and/or an adder. Further, the present technique may be capable of processing multiple threads of vertex data in a plurality of graphics processing modules. To accommodate such situation, a code segment delay may be employed between operations of the graphics processing modules.
In one embodiment of the present invention, a technique may be employed for sequencing buffer, or memory space, management between a transform and a lighting module during graphics processing. In use, vertex data is received in a buffer of a first set of buffers. The buffer in which the vertex data is received is based on a predetermined sequence, i.e. round robin. Subsequently, an empty buffer of a second set of buffers is identified also based on a predetermined sequence. The transform module is coupled between the first set of buffers and the second set of buffers. After the empty buffer of the second set of buffers is identified, the vertex data is processed in the transform module and outputted from the transform module to the identified empty buffer of the second set of buffers.
Similarly, an empty buffer of a third set of buffers is identified based on a predetermined sequence. Further, the lighting module is coupled between the second set of buffers and the third set of buffers. After the empty buffer of the third set of buffers is identified, the vertex data is processed in the lighting module. The vertex data is then outputted from the lighting module to the identified empty buffer of the third set of buffers.
As such, in one aspect of the present invention, a number of buffers in the first and second sets of buffers may be different, as may the number of the buffers in the second and third sets of buffers.
These and other advantages of the present invention will become apparent upon reading the following detailed description and studying the various figures of the drawings.