Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers, home video game computers, handheld devices, and the like. In such graphic systems, a number of procedures are executed to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a point, line, polygon, or the like. Rendered images are formed with combinations of these graphic primitives. Many procedures may be utilized to perform 3-D graphics rendering.
Specialized graphics processing units (e.g., GPUs, etc.) have been developed to optimize the computations required in executing the graphics rendering procedures. The GPUs are configured for high-speed operation and typically incorporate one or more rendering pipelines. Each pipeline includes a number of hardware-based functional units that are optimized for high-speed execution of graphics instructions/data. Generally, the instructions/data are fed into the front end of the pipeline and the computed results emerge at the back end of the pipeline. The hardware-based functional units, cache memories, firmware, and the like, of the GPU are optimized to operate on the low-level graphics primitives and produce real-time rendered 3-D images.
In modern real-time 3-D graphics rendering, the functional units of the GPU need to be programmed in order to properly execute many of the more refined pixel shading techniques. These techniques require, for example, the blending of colors into a pixel in accordance with factors in a rendered scene which affect the nature of its appearance to an observer. Such factors include, for example, fogginess, reflections, light sources, and the like. In general, several graphics rendering programs (e.g., small specialized programs that are executed by the functional units of the GPU) influence a given pixel's color in a 3-D scene. Such graphics rendering programs are commonly referred to as shader programs, or simply shaders. In more modern systems, some types of shaders can be used to alter the actual geometry of a 3-D scene (e.g., Vertex shaders) and other primitive attributes.
In a typical GPU architecture, each of the GPU's functional units is associated with a low level, low latency internal memory (e.g., register set, etc.) for storing instructions that programmed the architecture for processing the primitives. The instructions typically comprise shader programs and the like. The instructions are loaded into their intended GPU functional units by propagating them through the pipeline. As the instructions are passed through the pipeline, when they reach their intended functional unit, that functional unit will recognize its intended instructions and store them within its internal registers.
Prior to being loaded into the GPU, the instructions are typically stored in system memory. Because the much larger size of the system memory, a large number of shader programs can be stored there. A number of different graphics processing programs (e.g., shader programs, fragment programs, etc.) can reside in system memory. The programs can each be tailored to perform a specific task or accomplish a specific result. In this manner, the graphics processing programs stored in system memory act as a library, with each of a number of shader programs configured to accomplish a different specific function. For example, depending upon the specifics of a given 3-D rendering scene, specific shader programs can be chosen from the library and loaded into the GPU to accomplish a specialized customized result.
The graphics processing programs, shader programs, and the like are transferred from system memory to the GPU through a DMA (direct memory access) operation. This allows GPU to selectively pull in the specific programs it needs. The GPU can assemble an overall graphics processing program, shader, etc. by selecting two or more of the graphics programs in system memory and DMA transferring them into the GPU.
A problem exists, however, in that the overall length of any given shader program is limited by the hardware resources of the GPU architecture. For example, as the GPU architecture is designed, its required resources are determined by the expected workload of the applications they architectures targeted towards. Based on the expected workload, design engineers include a certain number of registers, command table memory of a certain amount, cache memory of a certain amount, FIFOs of a certain depth, and the like. This approach leads to a built-in inflexibility. The so-called “high-end” architectures often include excessive amounts of hardware capability which goes unused in many applications. This leads to excessive cost, excessive power consumption, excessive heat, and the like. Alternatively, the so-called “midrange” or “low-end” architectures often have enough hardware capability to handle most applications, but cannot properly run the more demanding high-end applications. Unfortunately, these high-end applications often have the most compelling and most interactive user experiences, and thus tend to be popular and in high demand.
Thus, a need exists for graphics architecture that implements a shader program loading and execution process that can scale as graphics application needs require and provide added performance without incurring penalties such as increased processor overhead.