The present invention relates to the field of computer graphics. Many computer graphic images are created by mathematically modeling the interaction of light with a three dimensional scene from a given viewpoint. This process, called rendering, generates a two-dimensional image of the scene from the given viewpoint, and is analogous to taking a photograph of a real-world scene.
As the demand for computer graphics, and in particular for real-time computer graphics, has increased, computer systems with graphics processing subsystems adapted to accelerate the rendering process have become widespread. In these computer systems, the rendering process is divided between a computer's general purpose central processing unit (CPU) and the graphics processing subsystem. Typically, the CPU performs high level operations, such as determining the position, motion, and collision of objects in a given scene. From these high level operations, the CPU generates a set of rendering commands and data defining the desired rendered image or images. For example, rendering commands and data can define scene geometry, lighting, shading, texturing, motion, and/or camera parameters for a scene. The graphics processing subsystem creates one or more rendered images from the set of rendering commands and data.
Scene geometry is typically represented by geometric primitives, such as points, lines, polygons (for example, triangles and quadrilaterals), and curved surfaces, defined by one or more two- or three-dimensional vertices. Each vertex may have additional scalar or vector attributes used to determine qualities such as the color, transparency, lighting, shading, and animation of the vertex and its associated geometric primitives.
Many graphics processing subsystems are highly programmable, enabling implementation of, among other things, complicated lighting and shading algorithms. In order to exploit this programmability, applications can include one or more graphics processing subsystem programs, which are executed by the graphics processing subsystem in parallel with a main program executed by the CPU. Although not confined to merely implementing shading and lighting algorithms, these graphics processing subsystem programs are often referred to as shading programs or shaders.
One portion of a typical graphics processing subsystem is a vertex processing unit. To enable a variety of per-vertex algorithms, for example for visual effects, the vertex processing unit is highly programmable. The vertex processing unit executes one or more vertex shader programs in parallel with the main CPU. While executing, each vertex shader program successively processes vertices and their associated attributes to implement the desired algorithms. Additionally, vertex shader programs can be used to transform vertices to a coordinate space suitable for rendering, for example a screen space coordinate system. Vertex shader programs can implement algorithms using a wide range of mathematical and logical operations on vertices and data, and can includes conditional and branching execution paths.
Unfortunately, vertex shader programs typically cannot arbitrarily access data stored in memory. This prevents vertex shader programs from using of data structures such as arrays. Using scalar or vector data stored in arrays enables vertex shader programs to perform a variety of additional per-vertex algorithms, including but not limited to advanced lighting effects, geometry effects such as displacement mapping, and complex particle motion simulations. Arrays of data could also be used to implement per-vertex algorithms that are impossible, unpractical, or inefficient to implement otherwise.
One barrier to allowing vertex shader programs to arbitrarily access data in memory is that arbitrary memory accesses typically have large latencies, especially when accessing external memory. When the vertex processing unit must stop vertex shader program execution until data is returned from memory, performance is severely decreased. Caches alone do little to reduce the occurrence of these pipeline stalls, as the size of arrays for some per-vertex algorithms are too large to be cached entirely.
It is therefore desirable for a vertex processing unit of a graphics processing subsystem to enable vertex shader programs to arbitrarily access array data. It is further desirable that the vertex processing unit efficiently access array data while minimizing the occurrence and impact of pipeline stalls due to memory latency.