1. Field of the Invention
The present invention relates generally to texture mapping in graphics systems, and more particularly to a system and method of block- and band-oriented traversal to achieve improved bandwidth in such systems.
2. Description of Background Art
Texture mapping is the process of mapping an image onto a surface in a three-dimensional graphics system. This technique is well-known in the art, and is described, for example, in J. Foley et al., Computer Graphics: Principles and Practice, 2d. ed., Addison-Wesley, 1990, at 741-44.
Referring now to FIG. 1, there is shown an example of texture mapping according to the prior art. The image to be mapped is referred to as a texture map 101, and its individual elements are referred to as texels. Texture map 101 is typically described in a rectangular coordinate scheme designated (u, v), and is ordinarily stored in some area of conventional memory, such as, for example, a conventional page-mode dynamic random-access memory (DRAM) or other paged memory. In the example of FIG. 1, four pages 110, 111, 112, 113 are shown, each corresponding to a portion of the image area containing a corresponding portion of texture map 101.
Surface 104 in three-dimensional space has its own coordinate system (s, t). In a typical three-dimensional graphics system, surface 104 may be a primitive such as a polygon; many such polygons may be defined in three-space to form a three-dimensional object or scene. Each such polygon would then have its own coordinate system (s, t) similar to the surface 104 indicated in FIG. 1. Based on the orientation of surface 104 in three-space, and on the position and orientation of the "camera", surface 104 is in turn mapped onto a two-dimensional display grid 103 stored in the frame buffer for display by the computer system. The mapping of surface 104 onto display grid 103 is accomplished by matrix transforms that are well-known in the art. Display grid 103 has coordinate system (x, y) and is typically implemented in an area of memory reserved for video display, such as video random-access memory (video RAM) e.g. VRAM or synchronous graphics random-access memory (SGRAM). Display grid 103 contains individual elements known as pixels, represented by distinct memory locations in video RAM.
Coordinates on display grid 103 are often considered to reside in "screen space". Similarly, coordinates in surface 102 are considered to reside in "surface space" and coordinates in texture map 101 are considered to reside in "texture space". The origins for each of the coordinate systems may be placed at any position, although typically the screen-space origin is placed either at bottom-left or top-right.
Each pixel in some region of display grid 103 maps onto a point on surface 104 and in turn to a point in texture map 101. Thus, in the example of FIG. 1, point A.sub.xy of display grid 103 maps onto point A.sub.st in the coordinates of surface 102 and to point A.sub.uv in texture map 101, or a group of points forming a region in texture map 101. Each of the mappings among display grid 103, surface 102, and texture map 101 may be point-to-point, point-to-region, region-to-point, or region-to-region. In conventional implementations of texture mapping systems, the mapping from display grid 103 to surface 104 and in turn to texture map 101 also generates a value d representing the level of detail for the particular texel. Typically, d is a measure of the perceived distance of the point in the texture map, as determined by a z-value for the point in the frame buffer. Points that are perceived to be farther away have a lower level of detail representing decreased resolution. In retrieving texels from texture map 101, d is used to implement a multum in parvo map (MIP map) scheme wherein several texels may be averaged, or otherwise filtered, and mapped onto one pixel of the frame buffer. This filtering may be performed on-the-fly, or filtered pixels may be pre-calculated at several selected resolution levels and stored for later retrieval, resulting in improved performance. The higher the value of d, the lower the level of detail, and the more pixel-filtering is performed. In the limit, the entire texture map may theoretically be reduced to one pixel in the frame buffer.
Conventional rasterization engines draw the image into the frame buffer by the known technique of scan conversion of primitives such as polygons and lines (see, for example, Foley et al.). Scan conversion takes as its input primitives defined in terms of vertices and orientations, and provides as its output a series of pixels to be drawn on the screen. As each pixel is generated by scan conversion, a rasterization engine performs the necessary mapping calculations to determine which texel of texture map 101 corresponds to the pixel. The rasterization engine then issues whatever memory references are required, such as texture fetch, z-fetch, z-writeback, color fetch, color write-back, and the like) to retrieve texel information for writing to the pixel being processed. Thus, memory references are issued in the order generated by the scan conversion. Conventionally, such memory references are stored and managed according to a first-in first-out (FIFO) scheme using a FIFO queue.
It is known that conventional page-mode DRAM components incur access-time penalties when accessing memory locations from different memory pages. For example, in some memory architectures such as SGRAM, an access to an open page requires one cycle, a read from a page not open requires nine cycles, a write to a page not open requires six cycles, and an access to an open page on a different bank requires three cycles. Thus, the above-described scheme of issuing memory references in the order generated by scan conversion may incur such penalties, as the referenced areas of texture map 101 may lie in different pages. In fact, depending on the distortion of the texture boundaries resulting from the particular mapping transformation being employed, references generated in scan conversion order may require repeated page-switching back and forth, also known as "thrashing". Since memory bandwidth is generally the bottleneck in fast generation of three-dimensional images, such repeated page-swapping results in diminished performance.
In addition, many memory systems employ a burst-mode access scheme wherein a shared memory resource is made available for a particular period of time to one process, and is then unavailable to that process while it services other processes. In order to maximize data transfer, it is advantageous to avoid page breaks within a burst. In essence, once a process has access to the shared memory resource, it is efficient for the process to retain access until the desired data segment has been transferred; page breaks may cause access to be shifted to another process, thus diminishing performance further.
Tiling has been found to be useful in improving data transfer in burst-mode access and in reduce page breaks. Referring again to FIG. 1, texture space 101 is shown divided into four areas 110, 111, 112, 113 corresponding to pages in memory. This is an example of a typical tiling scheme wherein each area (or tile) is stored in a page of memory, so that any scanning done within a tile does not cause page breaks. In general, page breaks only occur when scanning of one tile is complete and scanning of another tile begins. Typically, the tiled storage scheme yields an improved traversal path which reduces page breaks as compared with linear traversal.
Tiles and associated pages may be of any size, such as for example 32.times.32 pixels, for a total of 1024 pixels. With 16-bit pixels, this corresponds to a page size of 2 KB.
Referring now to FIGS. 1A and 1B, there is shown an example of an advantage of a tiled addressing scheme. FIG. 1A shows primitive 120 represented in a linear addressing scheme. Depending on the width of the frame buffer and page size, each scan line 121 may be represented in memory on its own page. For example, with a frame buffer of width 1024 and height 768, if each pixel has a 16-bit width, a 2 KB page holds a single scan line. Thus, each line segment of primitive 120 would be stored on a different page, and up to eight page breaks would be required to render primitive 120.
By contrast, FIG. 1B shows primitive 120 represented in a tiled addressing scheme. If the entire primitive fits within a single tile 122, no page breaks are required to render primitive 120. Even if primitive 120 spans a plurality of tiles, in general fewer page breaks will be required than with the linear addressing scheme of FIG. 1A.
The above-described tiled addressing scheme can be applied to image storage, and may be extended to color buffers and/or z-buffers as well.
Other techniques have also been attempted in the prior art to minimize page breaks. both for linear and tiled access schemes. One example is the use of specialized memory in place of conventional page-mode memory components. See, for example, H. Fuchs and J. Poulton, "Pixel-Planes: A VLSI-Oriented Design for a Raster Graphics Engine," in VLSI Design vol. 2., no. 3, 1981; M. Deering et al., "FBRAM: A New Form of Memory Optimized for 3D Graphics," in Computer Graphics, Proceedings of SIGGRAPH, 1995; A. Schilling et al., "Texram: A Smart Memory for Texturing," in IEEE Computer Graphics and Applications, 1996. Such systems generally improve memory bandwidth by, for example, associating memory directly with processing on an application-specific integrated circuit (ASIC), or by associating logic with memory directly on a DRAM chip. See A. Schilling et al. However, such techniques require highly specialized components that are generally more expensive than conventional page-mode DRAM.
Another attempt to reduce memory bandwidth is described in K. Akeley, "RealityEngine Graphics," in Computer Graphics Proceedings of SIGGRAPH, 1993. Akeley describes a system of extreme memory interleaving. This technique requires significantly more memory chips than do conventional memory systems, and consequently is more expensive.
A texture cache may be used to improve access bandwidth to the texture map. However, texture cache hit rate is limited by the locality of the texel references. Poor locality may result in excessive swapping of texels into and out of the texture cache. Conventional scan-line oriented traversal algorithms can result in poor locality, in particular if the texture primitives are oddly shaped. For example, when triangles are used as texture primitives, as is conventional, a very long and narrow triangle may span sufficient length along an x- or y-axis that, by the time the end of a span has been reached, the beginning of the span is no longer cached in the texture cache. Thus, when traversal of the next span begins, the required texels are no longer cached and must be retrieved from main texture memory. In other words, texture cache swap-out based on locality can result in poor hit rate when rendering certain shapes (such as long, narrow triangles).
Texel reordering techniques may be used to improve locality. However, the effectiveness of this approach is limited by the size of the access buffer, and may still fail to solve the texture space locality problem in some situations where the graphics primitives are of irregular shape. Texel reordering may be enhanced and locality improved by traversing two or more scan lines as a group. This approach results in additional complexity, as scheduling issues arise in determining which scan line is to be processed next. Such enhanced texel reordering also requires extensive feedback from the sorting algorithm and may introduce significant hardware design issues resulting from latency and verification.
Though the above-described techniques may be effective in improving rendering efficiency somewhat, unnecessary page-switching still occurs. In particular, burst-mode access to texture space results in unnecessary page thrashing and limited re-use of cached areas due to poor locality of texel references.
Another problem arises when burst-mode access is employed to access texture storage. Burst-mode typically retrieves a data segment having fixed length corresponding to the width of the memory bus (e.g. 64 bits, or 128 bits). Thus, burst-mode access to texture space typically retrieves more data than is immediately needed. Conventional texture storage mechanisms do not adequately cache the unneeded data for later re-use. Thus, additional accesses to this data may be generated where improved locality would have resulted in more effective caching and reuse of the previously retrieved data segment.
What is needed is a system of reducing memory bandwidth by minimizing page-switching in conventional page-mode memory, so as to improve performance of graphic engines for a wide range of client algorithms without substantially increasing the number of components or the overall cost. Specifically, what is needed is a system of improving locality of texel references over existing edge-walking and span-walking traversal techniques, and an improved burst-access scheme for texture space that yields improved re-use of retrieved data.