The rendering of three-dimensional graphical images is of interest in a variety of electronic games and other applications. Rendering is the general term that describes the overall multi-step process of transitioning from a database representation of a three-dimensional object to a two-dimensional projection of the object onto a viewing surface.
The rendering process involves a number of steps, such as, for example, setting up a polygon model that contains the information which is subsequently required by shading/texturing processes, applying linear transformations to the polygon mesh model, culling back facing polygons, clipping the polygons against a view volume, scan converting/rasterizing the polygons to a pixel coordinate set, and shading/lighting the individual pixels using interpolated or incremental shading techniques.
Graphics Processing Units (GPUs) are specialized integrated circuit devices that are commonly used in graphics systems to accelerate the performance of a 3-D rendering application. GPUs are commonly used in conjunction with a central processing unit (CPU) to generate three-dimensional images for one or more applications executing on a computer system. Modern GPUs typically utilize a graphics pipeline for processing data.
Prior art FIG. 1 shows a diagram depicting the various stages of a traditional prior art pipeline 100. The pipeline 100 is a conventional “deep” pipeline having stages dedicated to performing specific functions. A transform stage 105 performs geometrical calculations of primitives and may perform a clipping operation. A setup/raster stage 110 rasterizes the primitives. A texture address 115 and texture fetch 120 stage are utilized for texture mapping. A fog stage 130 implements a fog algorithm. An alpha test stage 135 performs an alpha test. A depth test 140 performs a depth test for culling occluded pixels. An alpha blend stage 145 performs an alpha blend color combination algorithm. A memory write stage 150 writes the output of the pipeline.
The stages of the traditional GPU pipeline architecture illustrated in FIG. 1 are typically optimized for high-speed rendering operations (e.g., texturing, lighting, shading, etc.) using a widely implemented graphics programming API (application programming interface), such as, for example, the OpenGL™ graphics language, Direct3D™, and the like. The architecture of the pipeline 100 is configured as a multi-stage deep pipeline architecture in order to maximize the overall rendering throughput of the pipeline. Generally, deep pipeline architectures have sufficient data throughput (e.g., pixel fill rate, etc.) to implement fast, high quality rendering of even complex scenes.
There is an increasing interest in utilizing three-dimensional (3-D) graphics in portable handheld devices where cost and power consumption are important design requirements. Such devices include, for example, wireless phones, personal digital assistants (PDAs), and the like. However, the traditional deep pipeline architecture requires a significant chip area, resulting in greater cost than desired. Additionally, a deep pipeline consumes significant power, even if the stages are performing comparatively little processing. This is because many of the stages consume about the same amount of power regardless of whether they are processing pixels.
As a result of cost and power considerations, the conventional deep pipeline architecture illustrated in FIG. 1 is unsuitable for many graphics applications, such as implementing three-dimensional games on wireless phones and PDAs. Therefore, what is desired is a processor architecture suitable for graphics processing applications but with reduced power and size requirements.
In conventional GPUs, calculation of depth data and color data as well as texture coordinates may be hard coded. That is, portions of the GPU pipeline architecture are fixed in function. Consequently, results from the GPU pipeline architecture are stored in specific buffers associated with the respective depth, color, or texture coordinate data and have specific data write functions at each stage. As a result, the GPU is limited in its application by software engineers.
In addition, conventional GPUs may write depth, color, or various texture coordinate data to system memory on a pixel-by-pixel and stage-by-stage basis. For low power, and small screen size handheld devices, this pixel-by-pixel transfer may present a bottleneck in data transfer, since pixel information is typically reduced in size to accommodate the handheld devices. As a result, the bandwidth of the GPU architecture is extremely limited and may retard read and writes to the system memory slowing the overall speed and increasing power consumption of the GPU pipeline architecture.