The present invention relates to 3D graphics, and particularly to data flow through rendering processors.
Background: 3D Computer Graphics
One of the driving features in the performance of most single-user computers is computer graphics. This is particularly important in computer games and workstations, but is generally very important across the personal computer market.
For some years the most critical area of graphics development has been in three-dimensional (“3D”) graphics. The peculiar demands of 3D graphics are driven by the need to present a realistic view, on a computer monitor, of a three-dimensional scene. The pattern written onto the two-dimensional screen must therefore be derived from the three-dimensional geometries in such a way that the user can easily “see” the three-dimensional scene (as if the screen were merely a window into a real three-dimensional scene). This requires extensive computation to obtain the correct image for display, taking account of surface textures, lighting, shadowing, and other characteristics.
The starting point (for the aspects of computer graphics considered in the present application) is a three-dimensional scene, with specified viewpoint and lighting (etc.). The elements of a 3D scene are normally defined by sets of polygons (typically triangles), each having attributes such as color, reflectivity, and spatial location. (For example, a walking human, at a given instant, might be translated into a few hundred triangles which map out the surface of the human's body.) Textures are “applied” onto the polygons, to provide detail in the scene. (For example, a flat carpeted floor will look far more realistic if a simple repeating texture pattern is applied onto it.) Designers use specialized modelling software tools, such as 3D Studio, to build textured polygonal models.
The 3D graphics pipeline consists of two major stages, or subsystems, referred to as geometry and rendering. The geometry stage is responsible for managing all polygon activities and for converting three-dimensional spatial data into a two-dimensional representation of the viewed scene, with properly-transformed polygons. The polygons in the three-dimensional scene, with their applied textures, must then be transformed to obtain their correct appearance from the viewpoint of the moment; this transformation requires calculation of lighting (and apparent brightness), foreshortening, obstruction, etc.
However, even after these transformations and extensive calculations have been done, there is still a large amount of data manipulation to be done: the correct values for EACH PIXEL of the transformed polygons must be derived from the two-dimensional representation. (This requires not only interpolation of pixel values within a polygon, but also correct application of properly oriented texture maps.) The rendering stage is responsible for these activities: it “renders” the two-dimensional data from the geometry stage to produce correct values for all pixels of each frame of the image sequence.
The most challenging 3D graphics applications are dynamic rather than static. In addition to changing objects in the scene, many applications also seek to convey an illusion of movement by changing the scene in response to the user's input. The technical term for changing the database of geometry that defines objects in a scene is transformation. The operations involve moving an object in the X, Y, or Z direction, rotating it in relation to the viewer (camera), or scaling it to change the size. (The “X” coordinate can represent, for example, left-right position; “Y” the location in the top-to-bottom axis; and “Z” the position along the axis from “in front” to behind.)
Whenever a change in the orientation or position of the camera is desired, every object in a scene must be recalculated relative to the new view. As can be imagined, a fast-paced game needing to maintain a high frame rate will require many calculations and many memory accesses.
FIG. 2 shows a high-level overview of the processes performed in the overall 3D graphics pipeline. However, this is a very general overview, which ignores the crucial issues of what hardware performs which operations.
Texturing
There are different ways to add complexity to a 3D scene. Creating more and more detailed models, consisting of a greater number of polygons, is one way to add visual interest to a scene. However, adding polygons necessitates paying the price of having to manipulate more geometry. 3D systems have what is known as a “polygon budget,” an approximate number of polygons that can be manipulated without unacceptable performance degradation. In general, fewer polygons yield higher frame rates.
The visual appeal of computer graphics rendering is greatly enhanced by the use of “textures.” A texture is a two-dimensional image which is mapped into the data to be rendered. Textures provide a very efficient way to generate the level of minor surface detail which makes synthetic images realistic, without requiring transfer of immense amounts of data. Texture patterns provide realistic detail at the sub-polygon level, so the higher-level tasks of polygon-processing are not overloaded. See Foley et al., Computer Graphics: Principles and Practice (2. ed. 1990, corr. 1995), especially at pages 741–744; Paul S. Heckbert, “Fundamentals of Texture Mapping and Image Warping,” Thesis submitted to Dept. of EE and Computer Science, University of California, Berkeley, Jun. 17, 1994; Heckbert, “Survey of Computer Graphics,” IEEE Computer Graphics, November 1986, pp. 56; all of which are hereby incorporated by reference. Game programmers have also found that texture mapping is generally an efficient way to achieve very dynamic images without requiring a hugely increased memory bandwidth for data handling.
A typical graphics system reads data from a texture map, processes it, and writes color data to display memory. The processing may include mipmap filtering which requires access to several maps. The texture map need not be limited to colors, but can hold other information that can be applied to a surface to affect its appearance; this could include height perturbation to give the effect of roughness. The individual elements of a texture map are called “texels.”
Background: Pipelined and Message-Passing Architectures
A series of patents from 3Dlabs have disclosed a 3D graphics architecture in which pipelined rendering is implemented by a message-passing architecture. Examples of various embodiments, and of ancillary features, include U.S. Pat. Nos. 5,701,444, 5,727,192, 5,742,796, 5,764,228, 5,764,243, 5,774,133, 5,777,629, 5,798,770, 5,805,868, 5,815,166, 5,835,096, 6,025,853, 6,111,584, 6,285,373, 6,348,919, and 6,377,266, all of which are hereby incorporated by reference.
FIG. 4 shows how rendering-related messages flow in a pure-pipeline message-passing architecture. In this figure, messages flow down the pipeline from left to right. Any bottlenecks anywhere in the pipeline will soon stall the entire pipeline.
In this example, a transform/lighting stage 210 generates data for fragment vertices, which is then rasterized (by a stage not shown) and Z-buffered (by depth buffer operation stage 220). The resulting per-pixel data is then passed to texturing stage 230, which performs remaining per-pixel tasks. The completed set of pixel data (for each frame) is then handled by frame buffer operations stage 240, which passes the frame buffer data along to LUTDAC or other outputs.
Background: Workload Balancing
During the process of drawing an image in 3D graphics, at various points in the frame, different sections of the system have different workloads. In a typical application, a background is normally drawn first, using fairly large polygons. This will therefore have quite a small workload for the transformation and lighting (T&L) part of the system, but a high workload for the rasterization, Z-buffering and texturing parts of the system. After the background has been drawn, the foreground components generally have much larger polygon counts, and much smaller projected areas per triangle. This therefore places a heavier workload on the T&L part, and eases the workload on the Z-buffering and texturing parts. These also may obscure large amounts of the background, however as this has already been rasterized and textured, this makes this initial work wasted effort.
It is desirable that all parts of the system should be kept busy at all times, in order to achieve maximum performance, and to make the most cost-effective system.
The standard solution to such a problem is to include fifos in the design, to smooth out bubbles in the processing, by allowing the T&L subsystem to run some number of fragments ahead of the rasterization and texturing subsystems. This however is only a short-term solution, as only a small number of fragments can be buffered up in such a design before the physical size of the fifo becomes a limiting factor.
3D Graphics with Optional Memory Write Before Texturing
The present application describes a 3D graphics architecture in which interfaces to memory are combined with pipeline processing. The rendering units are not all connected in a straight-through pipeline relationship: instead the rendering pipeline is “broken,” so that the stream of fragments (e.g. triangles) being processed is parked in memory.
This turns out to be surprisingly efficient as a way to separate rendering processes where the workload balance is different. The Z-buffering operations are less computationally intensive than the texturing operations. It is preferable to include the first memory interface before Z-buffering, and the second one after Z-buffering. In one notable class of embodiments, a first write to memory is performed after transformation and lighting calculations and before rasterization and Z-buffering, and a second write to memory is performed before texturing.
Since stippling is required for accurate Z-buffering (so that the depth buffer doesn't contain pixels which should have been stippled out), so stippling too is preferably performed before the second memory access.
These interfaces to memory operate quite differently from the limited FIFO memory which is typically included in the hardware structure of any pipeline architecture. The memory interfaces which separate different groups of rendering processes are accesses to “open” memory, i.e. to main memory or to virtual memory. Space for these writes is typically provided by external memory devices, which are capacious and cheap. While there will inevitably be some size limit in any memory access, preferably these memory accesses are given enough memory allocation that their capacity is usually irrelevant to any one frame's rendering workload. (For example, in 2002-era PC architectures, several megabytes of storage may be assigned to each memory interface in the pipeline, so that sufficient fragments may be stored to keep each section of the pipeline busy.)
In a further teaching, a two-pass Z-buffering operation is performed before texturing: the first pass obtains the correct values for the maximum depth visible at each pixel, and the second path discards fragments which are invisible at each pixel of the final scene. This saves on processing, since the texturing operation processes only fragment's pixels which have passed the second pass of the Z-buffering operation.
Note that the use of two-pass Z-buffering is particularly efficient, but it is difficult to obtain the full efficiency of two-pass operation in a straight-through pipelined architecture. The use of memory writes according to the present application facilitates this, and thus facilitates reducing the number of texturing operations to a bare minimum.
The use of additional off-chip memory accesses may seem paradoxical, since it slows down the minimum time to complete processing of a single pass through the pipeline. However, the surprising teaching of the present application is that these additional memory accesses can actually provide a net increase in average throughput. That is, performing two operations per pixel at the Z-buffering stage provides a net reduction in total burden.
In a further class of embodiments, at least one of the memory interfaces is made optional. If two-pass Z-buffering or texturing is not being used for a particular rendering task, one or both of the memory interfaces can be switched off for that task. This economizes on memory bandwidth for such tasks, while retaining full flexibility for optimization in general.
Preferably the first memory access writes fragment data, not full pixel data. That is, data would be given for the three vertices of a triangle (locations, colors, etc.), but the values for the interior pixels would not yet be specified. Thus this interface consumes a relatively low memory bandwidth.
Preferably the rasterization operation, which translates perfragment data into fragment's-pixel data, is performed after the first memory interface. This permits the Z-buffering to be calculated correctly on a per-pixel basis. Since the two-step Z-buffering operation will filter out many fragment's-pixels, shading and most further operations are preferably deferred until after the second memory interface.
These various embodiments are particularly advantageous in balancing workload between two-pass Z-buffering and the texturing and related processes. Since the texturing operations are much more computationally burdensome, there is NO cost to performing the second pass Z-buffer test before the memory interface.