The present application relates to computer graphics rendering systems and methods, and particularly to handling of texture data used by rendering accelerators for 3D graphics.
Background: 3D Computer Graphics
One of the driving features in the performance of most single-user computers is computer graphics. This is particularly important in computer games and workstations, but is generally very important across the personal computer market.
For some years the most critical area of graphics development has been in three-dimensional (xe2x80x9c3Dxe2x80x9d) graphics. The peculiar demands of 3D graphics are driven by the need to present a realistic view, on a computer monitor, of a three-dimensional scene. The pattern written onto the two-dimensional screen must therefore be derived from the three-dimensional geometries in such a way that the user can easily xe2x80x9cseexe2x80x9d the three-dimensional scene (as if the screen were merely a window into a real three-dimensional scene). This requires extensive computation to obtain the correct image for display, taking account of surface textures, lighting, shadowing, and other characteristics.
The starting point (for the aspects of computer graphics considered in the present application) is a three-dimensional scene, with specified viewpoint and lighting (etc.). The elements of a 3D scene are normally defined by sets of polygons (typically triangles), each having attributes such as color, reflectivity, and spatial location. (For example, a walking human, at a given instant, might be translated into a few hundred triangles which map out the surface of the human""s body.) Textures are xe2x80x9cappliedxe2x80x9d onto the polygons, to provide detail in the scene. (For example, a flat carpeted floor will look far more realistic if a simple repeating texture pattern is applied onto it.) Designers use specialized modelling software tools, such as 3D Studio, to build textured polygonal models.
The 3D graphics pipeline consists of two major stages, or subsystems, referred to as geometry and rendering. The geometry stage is responsible for managing all polygon activities and for converting three-dimensional spatial data into a two-dimensional representation of the viewed scene, with properly-transformed polygons. The polygons in the three-dimensional scene, with their applied textures, must then be transformed to obtain their correct appearance from the viewpoint of the moment; this transformation requires calculation of lighting (and apparent brightness), foreshortening, obstruction, etc.
However, even after these transformations and extensive calculations have been done, there is still a large amount of data manipulation to be done: the correct values for EACH PIXEL of the transformed polygons must be derived from the two-dimensional representation. (This requires not only interpolation of pixel values within a polygon, but also correct application of properly oriented texture maps.) The rendering stage is responsible for these activities: it xe2x80x9crendersxe2x80x9d the two-dimensional data from the geometry stage to produce correct values for all pixels of each frame of the image sequence.
The most challenging 3D graphics applications are dynamic rather than static. In addition to changing objects in the scene, many applications also seek to convey an illusion of movement by changing the scene in response to the user""s input. Whenever a change in the orientation or position of the camera is desired, every object in a scene must be recalculated relative to the new view. As can be imagined, a fast-paced game needing to maintain a high frame rate will require many calculations and many memory accesses.
FIG. 2 shows a high-level overview of the processes performed in the overall 3D graphics pipeline. However, this is a very general overview, which ignores the crucial issues is of what hardware performs which operations.
Texturing
There are different ways to add complexity to a 3D scene. Creating more and more detailed models, consisting of a greater number of polygons, is one way to add visual interest to a scene. However, adding polygons necessitates paying the price of having to manipulate more geometry. 3D systems have what is known as a xe2x80x9cpolygon budget,xe2x80x9d an approximate number of polygons that can be manipulated without unacceptable performance degradation. In general, fewer polygons yield higher frame rates.
The visual appeal of computer graphics rendering is greatly enhanced by the use of xe2x80x9ctextures.xe2x80x9d A texture is a two-dimensional image which is mapped into the data to be rendered. Textures provide a very efficient way to generate the level of minor surface detail which makes synthetic images realistic, without requiring transfer of immense amounts of data. Texture patterns provide realistic detail at the sub-polygon level, so the higher-level tasks of polygon-processing are not overloaded. See Foley et al., Computer Graphics: Principles and Practice (2.ed. 1990, corr.1995), especially at pages 741-744; Paul S. Heckbert, xe2x80x9cFundamentals of Texture Mapping and Image Warping,xe2x80x9d Thesis submitted to Dept. of EE and Computer Science, University of California, Berkeley, Jun. 17, 1994; Heckbert, xe2x80x9cSurvey of Computer Graphics,xe2x80x9d IEEE Computer Graphics, November 1986, pp.56; all of which are hereby incorporated by reference. Game programmers have also found that texture mapping is generally a very efficient way to achieve very dynamic images without requiring a hugely increased memory bandwidth for data handling.
A typical graphics system reads data from a texture map, processes it, and writes color data to display memory. The processing may include mipmap filtering which requires access to several maps. The texture map need not be limited to colors, but can hold other information that can be applied to a surface to affect its appearance; this could include height perturbation to give the effect of roughness. The individual elements of a texture map are called xe2x80x9ctexels. xe2x80x9d
Awkward side-effects of texture mapping occur unless the renderer can apply texture maps with correct perspective. Perspective-corrected texture mapping involves an algorithm that translates xe2x80x9ctexelsxe2x80x9d (pixels from the bitmap texture image) into display pixels in accordance with the spatial orientation of the surface. Since the surfaces are transformed (by the host or geometry engine) to produce a 2D view, the textures will need to be similarly transformed by a linear transform (normally projective or xe2x80x9caffinexe2x80x9d). (In conventional terminology, the coordinates of the object surface, i.e. the primitive being rendered, are referred to as an (s,t) coordinate space, and the map of the stored texture is referred to a (u,v) coordinate space.) The transformation in the resulting mapping means that a horizontal line in the (x,y) display space is very likely to correspond to a slanted line in the (u,v) space of the texture map, and hence many additional reads will occur, due to the texturing operation, as rendering walks along a horizontal line of pixels.
Data and Memory Management
Due to the extremely high data rates required at the end of the rendering pipeline, many features of computer architecture take on new complexities in the context of computer graphics (and especially in the area of texture management).
Virtual Memory Management
One of the basic tools of computer architecture is xe2x80x9cvirtualxe2x80x9d memory. This is a technique which allows application software to use a very large range of memory addresses, without knowing how much physical memory is actually present on the computer, nor how the virtual addresses correspond to the physical addresses which are actually used to address the physical memory chips (or other memory devices) over a bus.
Some further discussion of Virtual memory management can be found in Hennessy and Patterson, Computer Architecture: a Quantititive Approach (2.ed.1996); Hwang and Briggs, Computer Architecture and Parallel Processing (1984); Subieta, Object-based virtual memory for PCs (1990); Carr, Virtual memory management (1984); Lau, Performance improvement of virtual memory systems (1982); and Loshin, Efficient Memory Programming (1998); all of which are hereby incorporated by reference. An excellent hypertext tutorial is found in the Web pages which start at http://cne.gmu.edu/Modules/VM/, and this hypertext tutorial is also hereby incorporated by reference. Another useful online resource is found at http://www.harlequin.com/mm/reference/faq.html, and this too is hereby incorporated by reference. Much current work can be found in the annual proceedings of the ACM International Symposium on Memory Management (ISMM), which are all hereby incorporated by reference.
AGP and GART
Beginning with the Pentium II▪, some Intel processors have included the capability for an Accelerated Graphics Port (AGP). The AGP provides a high-speed dedicated bus for fast transfer of graphics data. (Unlike the PCI bus, the AGP bus is pipelined, and allows is only two devices on it.)
To support this high-speed bus, the Intel specification also provides a special protocol for xe2x80x9cAGP memory.xe2x80x9d This is not physically separate memory, but just dynamically-allocated system DRAM areas which the graphics chip can access quickly. The Intel chip set includes address translation hardware which makes the xe2x80x9cAGP memoryxe2x80x9d look continuous to the graphics controller. This permits the graphics chip to access large texture bitmaps (e.g. 128 KB) as a single entity.
Intel""s built-in chip set hardware is called the GART (Graphics Address Remapping Table). The GART hardware is somewhat similar in function to the paging hardware in the CPU chip, in that the processor xe2x80x9clinearxe2x80x9d virtual addresses get automatically translated into physical addresses (which may point to system RAM and local Frame Buffer memory, as well as the AGP RAM).
However, this translation is fairly inflexible, and completely out of the user""s control. Thus it cannot be optimized for particular applications, software architectures, or graphic accelerator architectures.
Virtual Texture Memory
Virtualization of texture memory, like virtualization of host memory, gives the user the impression of a memory space which is larger than can be physically accommodated in real memory. This is achieved by partitioning the memory space into a small physical working set and a large virtual set with dynamic swapping between the two. For virtual memory management in CPUs the physical working set is main memory and the virtual set is disk storage.
The swapping required for virtual memory management is normally done automatically (as far as the application software is concerned). There is a vast amount of literature concerning CPU based virtual memory systems and their management.
The apparently-larger virtual texture memory space increases performance as the optimum set of textures (or part of textures) are chosen for residence by the hardware. It also simplifies the management of texture memory by the driver and/or application where either or both try to manage the memory manually. This is akin to program overlays before the days of virtual memory on CPUs where the program had to dynamically load and unload segments of itself.
The present inventor has realized that managing the texture memory in the driver or by the application is very difficult (or impossible) to do properly, because:
1. What does the driver/application do when it runs out of memory and needs to fit another texture in? Which texture(s) does it delete?
2. The texture has to be completely resident and physically contiguous so a large enough space must be made available.
3. A texture which is about to be used MUST NOT be deleted or moved: otherwise all command buffers will be outdated.
4. In some cases a texture map will not fit into memory even when all other textures are deleted (a 2Kxc3x972K 32bpp texture map takes 16 MBytes of memory).
5. The texture heap must be compacted to reclaim storage.
The idea of applying virtual management techniques to textures in 3D graphics hardware appears to be suggested, for example, by U.S. Pat. No. 5,790,130 to Gannett. This patent suggests that xe2x80x9cA graphics hardware device, coupled to the host computer, renders texture mapped images, and includes a local memory that stores at least a portion of the texture data stored in the system memory at any one time. A software daemon runs on the processor of the host computer and manages transferring texture data from the system memory to the local memory when needed by the hardware device to render an image.xe2x80x9d (Abstract) This and/or other virtual texture memory schemes are believed to have been used in some products of HP and SGI. However, the present inventor has realized that these schemes are ill-suited for most personal computer applications (and many workstation applications). The main aim in these implementation seems to have been to allow very large texture maps (16Mxc3x9716M or larger) to be used. By contrast, the innovations in the present application are not motivated only by desire for such large maps, but to remove the software problems in managing the comparatively small amount of texture storage (vs the large amounts of texture storage in SGI and HP machines) efficiently. Thus it is possible that the architectural innovations disclosed herein can be used in combination with those used by SGI and HP.
Doubly-Virtualized Texture Memory
As noted above, virtual memory architectures have long been used in general-purpose computers. However, there turn out to be some surprising difficulties in using this idea in computer graphics (especially for texture memory). The present application discloses several innovations related to virtualization and caching of texture memory.
The present application claims the use of a two- or three-level memory hierarchy for texture memory. In a sample implementation of this architecture, the first level is the on-card memoryxe2x80x94i.e. the private memory attached to the rasterizer chip. (The xe2x80x9czerothxe2x80x9d level in the memory hierarchy is the on-chip cache, which is not particularly involved with the virtual memory system.) The second level is the host physical memory, and the third level is the host""s disks (or host virtual memory).
The normal scheme is for the page fault to cause the texture to be loaded from the host""s physical memory, but it is possible to tag the non-resident pages with a xe2x80x98virtual host texturexe2x80x99 bit which will cause an interrupt. The host will service this interrupt and retrieve the texture page off disk (either explicitly or via the host OS virtual memory system), lock it down and then have the hardware fetch it. Each graphics process can have up to 256 MBytes of virtual textures so, in theory, would require up to 256 MB of host memory to be locked down (thereby removing this memory from the general memory pool). Personal computers with enough memory to reserve 256 MB for a texture backing store are still rare, but by using the 3rd level of storage slower memory (i.e. disk) can be substituted for capacity.
Note that the disclosed memory hierarchy has AT LEAST three levels, typically four or more:
(possibly on-chip cache, and:)
on-card,
host physical, and
host bulk storage.
Preferably the graphics memory controller is able to lock down at least a portion of the host memory space, as described in the detailed description. If the hardware is going to auto fetch pages from host, at least some host memory must be locked down so it won""t be paged in and out. This is implies that the graphics subsystem is denying the main system access to some of its resources! This is a surprising choice, which is very unfriendly toward general system resources. Therefore, to avoid tying up host space with unneeded lockdowns, the presently preferred embodiment allows lockdown of only half the memory less 24M.
In the presently preferred embodiment, the memory hierarchy is implemented with:
2KB on chip;
8-20M on card;
host physical memory of e.g. 20-200M; and
host bulk storage, typically many gigabytes.
Notable (and separately innovative) features of the virtual texture mapping architecture described in the present application include at least the following: A single chip solution is provided; Two or three levels of texture memory hierarchy are supported; The page faulting is all done in hardware with no host intervention; The texture memory management function can be used to manage texture storage in the host memory in addition to the texture storage in our normal texture memory; Multiple memory pools are supported; and multiple rasterizers can be supported capably.
The disclosed inventions will be described with reference to the accompanying drawings, which show important sample embodiments of the invention and which are incorporated in the specification hereof by reference, wherein:
FIG. 1 is an overview of a computer system, with a rendering subsystem, which incorporates the disclosed graphics memory management ideas.
FIG. 2 is a very high-level view of other processes performed in a 3D graphics computer system.
FIG. 3 shows a block diagram of a 3D graphics accelerator subsystem.
FIGS. 4A and 4B are a pair of flow charts which show how a texture is loaded, depending on whether a cache miss occurs.
FIG. 5 shows a 2-D coordinate space mapped to a 1-D address range.
FIG. 6 shows a 2xc3x972 patch arrangement within a texture map.
FIGS. 7A and 7B show layouts in memory for the various supported formats.
FIG. 8 shows how the map level and address can be encoded into the least amount of bits.
FIG. 9 shows which texels the memory reads bring in and the corresponding output fragments they will satisfy.
FIG. 10 shows a block diagram of the Texture Read Unit.
FIG. 11 shows a block diagram of the Primary Cache Manager.
FIG. 12 shows a block diagram of the Cache Directory.
FIG. 13 shows a block diagram of the CAM Cell.
FIG. 14 shows a block diagram of the Translation Look aside Buffer (TLB).
FIG. 15 shows a block diagram of an individual CAM cell.
FIG. 16 shows a sample configuration where two rasterizers are served by a common memory manager and bus interface chip.