This invention relates generally to video graphics processing and, more particularly, to a method and apparatus for executing a predefined instruction set.
As is known, a conventional computing system includes a central processing unit, a chip set, system memory, a video graphics processor, and a display. The video graphics processor includes a raster engine and a frame buffer. The system memory includes geometric software and texture maps for processing video graphics data. The display may be a cathode ray tube (CRT) display, a liquid crystal display (LCD) or any other type of display. A typical prior art computing system of the type described above is illustrated in FIG. 1. As shown in FIG. 1, the system 100 includes a host 102 coupled to a graphics processor 104 and a display 106. The host 102 comprises the central processing unit, chip set and system memory as described above. The host 102 is responsible for the overall operation of the system 100. In particular, the host 102 provides, on a frame by frame basis, video graphics data to the display 106 for display to a user of the system 100. The graphics processor 104, which comprises the raster engine and frame buffer, assists the host 102 in processing the video graphics data.
To process video graphics data, particularly three dimensional (3D) graphics, the central processing unit executes video graphics or geometric software to produce geometric primitives, which are often triangles. A plurality of triangles is used to generate an object for display. Each triangle is defined by a set of vertices, where each vertex is described by a set of attributes. The attributes for each vertex can include spatial coordinates, texture coordinates, color data, specular color data or other data as known in the art. Upon receiving a geometric primitive, the raster engine of the video graphics processor generates pixel data based on the attributes for one or more of the vertices of the primitive. The generation of pixel data may include, for example, texture mapping operations performed based on stored textures and texture coordinate data for each of the vertices of the primitive. The pixel data generated is blended with the current contents of the frame buffer such that the contribution of the primitive being rendered is included in the display frame. Once the raster engine has generated pixel data for an entire frame, or field, the pixel data is retrieved from the frame buffer and provided to the display.
Recently, Microsoft Corporation promulgated a standard relating to the processing of video graphics, i.e., the so-called DirectX 8.0 Standard. Among other things the DirectX Standard calls for the use of a programmable vertex shader. As its name would imply, a programmable vertex shader (PVS) is essentially a generic processing device that may be programmed using a finite set of instructions. The set of instructions is particularly designed for use in processing graphics primitives, and the instructions are executed by a PVS engine. To this end the PVS engine is also coupled to a temporary register memory that, by standard, comprises three read output ports. The three read output ports are provided as inputs to the PVS engine. However, the number of instructions that actually require all three ports in order to be executed by the PVS engine is relatively small. For example, a so-called multiply-and-add (MADD) instruction is included in the instruction set. The MADD instruction multiplies two input operands and adds the result to a third input operand all in one clock cycle, e.g., (axc3x97b)+c where a and b are multiplicands and c is an addend. It is possible, however, that the three input operands for the MADD instruction must come from the temporary register memory. This relatively infrequent occurrence is accommodated by the DirectX 8.0 Standard through the provision of three ports to the temporary register memory.
Those having ordinary skill in the art recognize the attractiveness of providing only two read ports for the temporary registers memory. That is, due to the relatively infrequent occurrence of instructions requiring three input operands from temporary register memory, and due to the efficiencies (both in terms of cost and complexity) that could be realized, it would be advantageous to provide a DirectX 8.0-compliant PVS implementation that requires only two temporary register memory ports. However, in order to provide such an implementation, the relatively infrequent, but nonetheless possible, occurrence of an instruction requiring three input operands from temporary register memory must be accommodated.
A solution to this problem is to inspect the code memory, where the currently-implemented instructions are stored, for occurrences of a MADD instruction requiring all of its inputs from the temporary register memory. Upon finding an instruction of this type, substitute instructions could be placed into the code memory in place of the identified MADD instruction. For example, the MADD instruction could be replaced by a multiply instruction and an additional add instruction. One shortcoming, however, with this solution is that it would require the length of the code memory to be doubled to accommodate the worst case scenario in which all of the instructions stored in the code memory comprise MADD instructions of this type. Such a solution is therefore prohibitively expensive.
Therefore, a need exists for a technique that accommodates the occurrence of instructions requiring a number of input operands greater than the output capacity of the temporary register memory. Stated more generally, such a technique should accommodate the occurrence of an instruction requiring (n+m) input operands with more than n of the input operands coming from an n-output data source. Additionally, such a technique should not require significant additions of, or modifications to, memory.