This invention relates generally to 3D graphics systems, and more particularly the invention relates to texture maps and the prefetching of texture map texels for use in generating pixels for display of a three-dimensional object.
Graphics systems for portable personal computers (PC""s) have advanced rapidly. Three-dimensional (3D) objects can be rendered for more realistic appearances. Arbitrarily complex textures can be applied to surfaces in correct perspective.
The texture of an object is stored in texture maps of varying levels of detail (LOD). Object surfaces closer to the user are shown in more detail by using a texture map with a high LOD, while far-away surfaces are shown with little texture detail by using a texture map with a coarser LOD. The texture elements (texels) in the texture map are used to modify or modulate pixels in a triangular display in order to add surface material type or environmental conditions effects. The displayed three-dimensional object comprises a plurality of interlocking triangles.
FIG. 1 shows texture maps with three different levels of detail. Note, LOD maps may vary from 1xc3x971 texels to 1Kxc3x971K or more texels. The LOD=3 map is an 8xc3x978 texture map with 8 rows of 8 texels; a total of 64 texels. This level of detail is used for surfaces at a certain distance to the viewer. The LOD=2 map has only 4xc3x974 or 16 texels. Each texel in the 4xc3x974 map may be generated by averaging together four pixels in the 8xc3x978 map. Other methods for mip-map generation may be employed.
The coarser LOD map is a 2xc3x972 map with only four texels. The LOD=1 map may be generated from the LOD=2 map by averaging together groups of four pixels in the 8xc3x978 map for each texel in the 2xc3x972 map. The LOD=1 map is used for more distant object surfaces that show little surface-texture detail.
When the displayed triangle is parallel to the viewer so that most pixels are the same distance (depth) from the viewer, the texels on a single LOD map may be applied directly to the pixels from a single LOD map. However, most triangles are at an angle to the viewer so that some pixels on the triangle are at a greater distance from the viewer than others. A perspective correction is required if unpleasant artifacts are to be avoided.
The u,v coordinate values for the stored texels as calculated may not be exact integer values. A pixel may fall between texels on the texture map. Then the texture value for the pixel may be calculated, or reconstructed, by a distance-weighted average of the four closest texels to the exact u,v coordinate. This is known as bi-linear interpolation since interpolation is performed in the two dimensions of a single LOD texture map. Other more effective, though more expensive, methods of texture map reconstruction may be employed.
The LOD map selected depends on the rate of change or derivative of the u,v coordinate. Triangles at high or glancing angles to the viewer have many u,v, points per screen pixel (x and y values) and thus have high derivatives or orders of change for u,v with respect to some x,y direction. The largest derivatives may be used to determine the LOD map to select for a pixel within a triangle. More effective, though more expensive, methods for LOD computation, such as anisotropic filtering, may be employed.
The texture maps may be stored in a dynamic-random-access memory (DRAM) such as the system memory or video buffer. However, texel access may be slower than desired for higher-performance 3D pipelines. The texel data may be rearranged within the DRAM by memory management software, such as described by Saunders in U.S. Pat. No. 5,871,197. However, the slow DRAM speed remains.
Hardware caches of texture maps may be necessary for highest performance. See Migdal et al., U.S. Pat. No. 5,760,783, and the background of Chimito, U.S. Pat. No. 5,550,961. These texture caches provide faster retrieval of texels than that from the frame buffer DRAM.
While such hardware texture caches are effective, hit rates are less than desired unless very large cache sizes are used. Caching texels is useful when the same texel is used again. However, new texels require lengthy miss processing to fetch them from the slower DRAM memory. On the other hand, the 3D pipeline needs to operate at the full clock rate in order to maximize performance. Prefetching is desirable, but it is not always apparent which texels to prefetch.
Since the texture caches are accessed by u,v texture-space addresses rather than x,y screen coordinates, a perspective correction of 1/w must be applied for each pixel to generate the u,v coordinates. This correction is not known until late in the pipeline. As triangles are rendered, pixels are rasterized for one span at a time. It is not immediately apparent how u,v changes as x,y changes for pixels within a triangle. Further, the level of detail may change as pixels in a span are being rendered, requiring a different LOD texture map to be loaded into the texture cache. Thus, determining how to prefetch texels is problematic.
What is desired is a 3D graphics system that prefetches texels into a texture cache. An intelligent prefetching mechanism for a texture cache is desired. It is desired to cache some but not all texture map texels in a higher-speed cache memory before or while a triangle is being rendered. It is desired to minimize texture-cache misses by prefetching texels that will soon be needed by the triangle pixel-rendering pipeline. It is further desired to minimize the bandwidth required for retrieving texels from DRAM. A minimum size texture cache with an organization optimized for a high-speed pixel pipeline is desired.
Heretofore, a 3D graphics system has been demonstrated that prefetches texels into a texture cache. Some but not all texture maps are cached in a higher-speed cache memory while a triangle is being rendered. Texture-cache misses are minimized by prefetching texels that will soon be needed by the triangle pixel rendering pipeline. The prefetching mechanism is not random. Prefetching occurs along a vector in the u,v, texture space for texels in a span. The u,v space is corrected in perspective from the x,y pixel space. While the magnitude of u,v changes varies within a span of pixels in a triangle, the direction of changes is constant along a span. The vector in the u,v, texture space is generated with the u,v coordinates of the first two pixels in a span. This vector points in the direction of further texels in the span. Blocks of texels are prefetched in this direction and not in other directions. Thus, although exact u,v information is not known in advance, the vector limits the possibilities for prefetching.
It has been realized that texture map data is ill-suited for linear address memory organization. Consequently, each texture map may be organized on 2-level hierarchical tiling. At the lowest hierarchy level, a map is sub-divided into blocks. Blocks represent the lowest level of texture organization of concern to the cache management. Each block is 4xc3x974 texels in size. Although the texture maps differ in size as the LOD changes, the blocks are a constant 4xc3x974 texels in size. Higher resolution LOD maps have more blocks than coarser LOD maps. Cache updating and management is simplified by the use of small, square blocks of texels that may closely match a triangle""s area. Because texture and pixel coordinates are statistically unrelated or unaligned to each other, only square blocks offer direction-independent retrieval of texel data for screen aligned pixels.
At the highest hierarchy level, blocks are arranged in tiles. A tile is a square array of blocks, such that one tile is contained within a page of DRAM. For example, 8xc3x978 blocks, where a block is 4xc3x974 16-bit texels, are contained within one 16K-bit DRAM page. This tile arrangement ensures direction-independent DRAM retrieval of blocks of texels with minimum page misses. This, in turn, maximizes the bandwidth utilization of DRAM access. Further, in a 4-bank DRAM organization, tiles may be 4-way interleaved (2-way in u and 2-way in v) allowing the hiding of RAS (Row Access) cycles during the other bank""s CAS (column access) cyclesxe2x80x94thus approaching the bandwidth of SRAM devices.
It is further realized that while the magnitude of u,v changes varies within a span of pixels in a triangle, the direction of changes is constant along a span or any line contained within the surface. A vector in the u,v texture space can be generated with the u,v coordinates of the first two pixels in a span. This vector then points in the direction of further texels in the span. Blocks of texels can be prefetched in this direction and not in other directions. Thus, although exact u,v information is not known in advance, the vector limits the possibilities for prefetching. In a similar fashion, a vector along the dominant edge, derived from the first pixels of the first two spans, may facilitate the prefetching of blocks while advancing from span to span.
When fixed-size blocks are combined with u,v-vector prefetching, the texture cache can store and update blocks prefetched along the direction of the u,v vectors. The cache and prefetching logic are simplified.
The prefetch mechanism is best tailored for DRAM burst accessing by means of prefetching a number of blocks at once. For example, assume the DRAM is organized as 256-bit columns within 16K-bit rows (pages). Then, 16 new pixelsxe2x80x94of 16 bpp (bits per pixel) typexe2x80x94along the span may require 4 blocks of texels or a burst of 4 column access cycles. A page miss may occur in about 1 of 5 cases. In order to mitigate the overhead associated with page misses, blocks within a map may be 4-way interleaved in u and v across 4 DRAM banks so as to allow the overlapping of new RAS cycles with current CAS (column access) cycles. This scheme may result in DRAM access approaching that of SRAM access.
The block prefetch mechanism accesses many more texels than required by a single span of pixels. The square blocks, in fact, pertain to 4 or more spans of pixels. Therefore, along longer spans, the extra texels pertaining to the other 3 spans is never used because the cache must be updated along the span as the storage capacity is depleted. This results in higher than necessary bandwidth demand.
An additional drawback arises from the use of only the direction of the u,v vector. The magnitude of the vector may change sufficiently to cause either insufficient or too much texel data to be prefetched. This has the effect of reducing the effective bandwidth available from the DRAM.
The present invention is directed to an improved prefetch mechanism including an algorithm for accessing u,v texel values in the stored texture maps based on updating texture blocks of texels for pixel tiles by using a current texel (u,v) address and calculated rates of change in (u,v) values with respect to changes in (x,y) pixel locations.
More particularly, in a three-dimensional graphics display system in which texture maps of an object are stored in memory for texels at (u,v) memory locations, the invention is a method of fetching texels for use in calculating (x,y) display pixel values comprising the steps of: a) identifying in (u,v) space a geometric shape to be displayed in (x,y) space, b) establishing tiles of pixels within the geometric shape for use in accessing texels, c) computing texel addresses at one side of a tile based on current addresses (topuc, topvc) and first and second derivatives of (u,v) as a function of (x) and a first derivative as a function of (y), d) computing texel addresses at an opposing side of the tile based on current addresses (u0,v0) and first and second derivatives of (u,v,) as a function of (x) and a first derivative as a function of (y), and e) fetching texel blocks within the tiles as defined by the addresses in steps c) and d).
In a preferred embodiment, the geometric shape is a triangle, a tile comprises a quadrilateral having top and bottom pixel locations for two opposing sides, and step c) and step d) define comers of the quadrilateral in (u,v) space.
The invention and objects and features thereof will be more readily apparent from the following detailed description and appended claims when taken with he drawings.