1. Field of the Invention
The invention is in the field of computer graphics, and more particularly to processing program instructions in a multi-pass graphics pipeline.
2. Description of the Related Art
Current multi-pass data processing methods are exemplified by systems and methods developed for computer graphics. This specialized field includes technology wherein data is processed through a multi-pass pipeline in which each pass typically performs a specific sequence of operations on the data and uses the output of one pass during processing of a subsequent pass. At the end of a first pass the output data is written to memory (local or host). During a subsequent pass the output data from the first pass is read from memory and processed.
Recent advances in graphics processors permit users to program graphics pipeline units using microcoded programs called pixel or shader programs to implement a variety of user defined shading algorithms. Although these graphics processors are able to execute shader programs, the program instructions that the graphics processors are capable of executing do not include loop and branch instructions. As a result, shader programs that repeat instructions, e.g., loop on different sets of data, must include instructions for each loop explicitly. For example, a loop comprised of ten instructions, where the loop is executed five times becomes fifty program instructions without a loop instruction compared with eleven instructions (ten plus the loop instruction) with a loop instruction. Longer shader programs required more storage resources (host or local memory) and require more bandwidth to download from a host memory system to a local graphics memory.
For the foregoing reasons, there is a need for a graphics system that supports the execution of loop instructions.
The present invention is directed to a system and method that satisfies the need for supporting the execution of loop instructions. Providing support for the execution of loop instructions enables users to write more efficient shader programs requiring fewer lines of code to implement the same function and therefore less memory is needed to store the shader programs. The present invention also provides the ability to execute branch instructions.
Various embodiments of the invention include a graphics subsystem comprising a programmable shader including an instruction processing unit, a fragment selector, a program counter unit, and a loop count unit. The instruction processing unit converts shader program instructions and outputs a sequence of converted program instructions based upon available resources in the programmable shader. The fragment selector selects fragments, under control of the instruction processing unit, from a total number of fragments. The program counter unit computes and outputs a current program counter and the loop count unit computes and outputs a current loop count, each under control of the instruction processing unit. Additionally, the invention can include a host processor, a host memory, and a system interface configured to interface with the host processor.
The programmable shader optionally includes a program instruction buffer configured to store a portion of the program instructions comprising the shader program, under control of the instruction processing unit.
The current program counter, indicating the program instruction that is being executed, is stored in the program counter unit. A program counter computation unit calculates a computed program counter using the current program counter. A selector selects between a value output by the instruction processing unit and the computed program counter to determine a new current program counter. Likewise, the current loop count, indicating the loop iteration that is being executed, is stored in the loop count unit. A loop count computation unit calculates a computed loop count using the current loop count. A selector selects between an initial loop count and the computed loop count to determine a new current loop count.
Some embodiments of the system further comprise a read interface to read the program instructions from a graphics memory. The shader program instructions include loop and/or branch instructions and the current program counter can specify a location in local memory or in the program instruction buffer. Furthermore, the invention includes support for executing nested loop instructions. The current loop count can be used by the instruction processing unit as an index to access a storage resource or can be output by the instruction processing unit to graphics processing units within the programmable shader and used to read and/or write storage resources accessed by those graphics processing units. Still further, the current loop count can be output by the instruction processing unit for each fragment, pixel, sample, or group of fragments, pixels, or samples.
Some embodiments of the present invention include a method of executing shader program instructions in a programmable shader comprising the steps of (a) selecting a set of fragments from a total number of fragments, (b) converting a number of the program instructions and outputting a sequence of converted program instructions comprising a portion of the shader program based upon available resources in the programmable shader, (c) processing the selected fragments by executing the sequence of converted program instructions, (d) repeating steps (b) and (c) until all of the portions of the shader program are executed, and (e) repeating steps (a), (b), (c) and (d) until the total number of fragments are processed. The program instructions can include branch and/or loop instructions where a loop instruction specifies a set of instructions to be executed a number of iterations. Additionally, the method can use a computing system to execute the shader program instructions. Furthermore, the method can include receiving an initial loop count that specifies the number of iterations. Still further, a current loop count can be selected from the initial loop count and a computed loop count and stored, where the computed loop count is calculated using a previous current loop count. The current loop count is stored prior to the execution of a nested loop instruction. The method can include reading the program instructions from a local memory or a local storage resource. The method can also include outputting the current loop count for each processed fragment, pixel, sample, or group of processed fragments, pixels, or samples.