The present application relates to computer graphics rendering systems and methods, and particularly to handling of texture data used by rendering accelerators for 3D graphics.
3D Computer Graphics
One of the driving features in the performance of most single-user computers is computer graphics. This is particularly important in computer games and workstations, but is generally very important across the personal computer market.
For some years the most critical area of graphics development has been in three-dimensional (xe2x80x9c3Dxe2x80x9d) graphics. The peculiar demands of 3D graphics are driven by the need to present a realistic view, on a computer monitor, of a three-dimensional scene. The pattern written onto the two-dimensional screen must therefore be derived from the three-dimensional geometries in such a way that the user can easily xe2x80x9cseexe2x80x9d the three-dimensional scene (as if the screen were merely a window into a real three-dimensional scene). This requires extensive computation to obtain the correct image for display, taking account of surface textures, lighting, shadowing, and other characteristics.
The starting point (for the aspects of computer graphics considered in the present application) is a three-dimensional scene, with specified viewpoint and lighting (etc.). The elements of a 3D scene are normally defined by sets of polygons (typically triangles), each having attributes such as color, reflectivity, and spatial location. (For example, a walking human, at a given instant, might be translated into a few hundred triangles which map out the surface of the human""s body.) Textures are xe2x80x9cappliedxe2x80x9d onto the polygons, to provide detail in the scene. (For example, a flat carpeted floor will look far more realistic if a simple repeating texture pattern is applied onto it.) Designers use specialized modelling software tools, such as 3D Studio, to build textured polygonal models.
The 3D graphics pipeline consists of two major stages, or subsystems, referred to as geometry and rendering. The geometry stage is responsible for managing all polygon activities and for converting three-dimensional spatial data into a two-dimensional representation of the viewed scene, with properly-transformed polygons. The polygons in the three-dimensional scene, with their applied textures, must then be transformed to obtain their correct appearance from the viewpoint of the moment; this transformation requires calculation of lighting (and apparent brightness), foreshortening, obstruction, etc.
However, even after these transformations and extensive calculations have been done, there is still a large amount of data manipulation to be done: the correct values for EACH PIXEL of the transformed polygons must be derived from the two-dimensional representation. (This requires not only interpolation of pixel values within a polygon, but also correct application of properly oriented texture maps.) The rendering stage is responsible for these activities: it xe2x80x9crendersxe2x80x9d the two-dimensional data from the geometry stage to produce correct values for all pixels of each frame of the image sequence.
The most challenging 3D graphics applications are dynamic rather than static. In addition to changing objects in the scene, many applications also seek to convey an illusion of movement by changing the scene in response to the user""s input. Whenever a change in the orientation or position of the camera is desired, every object in a scene must be recalculated relative to the new view. As can be imagined, a fast-paced game needing to maintain a high frame rate will require many calculations and many memory accesses.
FIG. 2 shows a high-level overview of the processes performed in the overall 3D graphics pipeline. However, this is a very general overview, which ignores the crucial issues of what hardware performs which operations.
Hardware Acceleration
Since rendering is a computationally intensive operation, numerous designs have offloaded it from the main CPU. An example of this is the GLINT chip described below.
Texturing
There are different ways to add complexity to a 3D scene. Creating more and more detailed models, consisting of a greater number of polygons, is one way to add visual interest to a scene. However, adding polygons necessitates paying the price of having to manipulate more geometry. 3D systems have what is known as a xe2x80x9cpolygon budget,xe2x80x9d an approximate number of polygons that can be manipulated without unacceptable performance degradation. In general, fewer polygons yield higher frame rates.
The visual appeal of computer graphics rendering is greatly enhanced by the use of xe2x80x9ctextures.xe2x80x9d A texture is a two-dimensional image which is mapped into the data to be rendered. Textures provide a very efficient way to generate the level of minor surface detail which makes synthetic images realistic, without requiring transfer of immense amounts of data. Texture patterns provide realistic detail at the sub-polygon level, so the higher-level tasks of polygon-processing are not overloaded. See Foley et al., Computer Graphics: Principles and Practice (2.ed. 1990, corr.1995), especially at pages 741-744; Paul S. Heckbert, xe2x80x9cFundamentals of Texture Mapping and Image Warping,xe2x80x9d Thesis submitted to Dept. of EE and Computer Science, University of California, Berkeley, Jun. 17, 1994; Heckbert, xe2x80x9cSurvey of Computer Graphics,xe2x80x9d IEEE Computer Graphics, November 1986, pp. 56; all of which are hereby incorporated by reference. Game programmers have also found that texture mapping is generally a very efficient way to achieve very dynamic images without requiring a hugely increased memory bandwidth for data handling.
A typical graphics system reads data from a texture map, processes it, and writes color data to display memory. The processing may include mipmap filtering which requires access to several maps. The texture map need not be limited to colors, but can hold other information that can be applied to a surface to affect its appearance; this could include height perturbation to give the effect of roughness. The individual elements of a texture map are called xe2x80x9ctexels.xe2x80x9d
Awkward side-effects of texture mapping occur unless the renderer can apply texture maps with correct perspective. Perspective-corrected texture mapping involves an algorithm that translates xe2x80x9ctexelsxe2x80x9d (pixels from the bitmap texture image) into display pixels in accordance with the spatial orientation of the surface. Since the surfaces are transformed (by the host or geometry engine) to produce a 2D view, the textures will need to be similarly transformed by a linear transform (normally projective or xe2x80x9caffinexe2x80x9d). (In conventional terminology, the coordinates of the object surface, i.e. the primitive being rendered, are referred to as an (s,t) coordinate space, and the map of the stored texture is referred to a (u,v) coordinate space.) The transformation in the resulting mapping means that a horizontal line in the (x,y) display space is very likely to correspond to a slanted line in the (u,v) space of the texture map, and hence many additional reads will occur, due to the texturing operation, as rendering walks along a horizontal line of pixels.
Due to the extremely high data rates required at the end of the rendering pipeline, many features of computer architecture take on new complexities in the context of computer graphics (and especially in the area of texture management).
Virtual Memory Management
One of the basic tools of computer architecture is xe2x80x9cvirtualxe2x80x9d memory. This is a technique which allows application software to use a very large range of memory addresses, without knowing how much physical memory is actually present on the computer, nor how the virtual addresses correspond to the physical addresses which are actually used to address the physical memory chips (or other memory devices) over a bus.
Some further discussion of Virtual memory management can be found in Hennessy and Patterson, Computer Architecture: a Quantititive Approach (2.ed. 1996); Hwang and Briggs, Computer Architecture and Parallel Processing (1984); Subieta, Object-based virtual memory for PCs (1990); Carr, Virtual memory management (1984); Lau, Performance improvement of virtual memory systems (1982); and Loshin, Efficient Memory Programming (1998); all of which are hereby incorporated by reference. An excellent hypertext tutorial is found in the Web pages which start at http://cne.gmu.edu/Modules/VM/, and this hypertext tutorial is also hereby incorporated by reference. Another useful online resource is found at http://www.harlequin.com/mm/reference/faq.html, and this too is hereby incorporated by reference. Much current work can be found in the annual proceedings of the ACM International Symposium on Memory Management (ISMM), which are all hereby incorporated by reference.
Texture Caching
A recurrent problem with texture mapping is the amount of data each texture map contains. If it is of high quality and detail it may require a substantial amount of storage space. The size of texture maps may be increased if mipmap filtering is supported. Simply moving textures from one physical storage location to another may be a time consuming operation. In a normal graphics system the time taken to transfer a texture from disk or system memory to the graphics system may be significantly more than the time taken to apply the texture. Network applications, in which the application and graphics system are on separate machines linked by a low bandwidth connection, aggravate this problem. Improvements can be made by caching the texture locally in the graphics system, but the time taken to transfer it just once may be prohibitive.
Caching would be particularly desirable for texture management in 3D graphics. The desirability for some form of texture caching is easily demonstrated by a simple calculation. If the target performance is to do trilinear filtering in a single cycle, then 8 texels per output fragment are required. If each texel is in true color (i.e. 32 bits per pixel), then the texture read bandwidth is 32 bytes per cycle, or (assuming a 100 MHz bus) 3.2 GB/s. With clever cache design this can be reduced to 1.25 texels read per pixel (assuming the texture maps are very much larger than will fit into the cache), i.e. 500 MB/s. (Note the trivial case where the texture maps fit into cache and are already loaded is an easy one to solve, but isn""t useful with real world scenarios.) Caching texture maps is not a new idea of itself, but previous implementations leave room for improvement.
DRAM Speed Improvements under Collocated Access
The access speed of DRAMs has generally not improved as fast as many system designers would like. One technique that has repeatedly surfaced, in various forms, is to provide improved access speeds for successive accesses to adjacent physical memory locations. In general, memory accesses often show some degree of spatial correlation, and an architecture which provides speed improvements for such accesses can be useful. (The accesses need not be strictly sequential; as long as successive accesses all fall within some small range of address values, they can be regarded as xe2x80x9ccollocated.xe2x80x9d If this range is small enough to fall within a range small enough for the memory architecture to economize on setup times, some net improvement can be achieved. For example, in fast page mode DRAM the row address setup time is eliminatted for any access which uses the same row address as the preceding access. (More precisely, a row access strobe (RAS ) signal is held constantly active while the column access strobe (CAS ) signal strobes in column addresses, to read successive cells in a single row.) Further developments of this have appeared, for example, in burst EDO DRAMs.
Of course other improvements have continued; one active area in the 1990s was synchronization between memory and processor, e.g. in the SDRAM architectures. SDRAMs also typically include two or four banks of memory in each chip, as discussed below. Synchronization has also been introduced into specialty graphics memory in the SGRAM architecture, which includes block write functions for faster fill in graphics applications.
Multi-Pool Texture Memory Management
As noted above, virtual memory architectures have long been used in general-purpose computers. However, there turn out to be some surprising difficulties in using this idea in computer graphics (especially for texture memory). The present application discloses several innovations related to virtualization and caching of texture memory.
Notable (and separately innovative) features of the virtual texture mapping architecture described in the present application include at least the following: A single chip solution is provided; Two or three levels of texture memory hierarchy are supported; The page faulting is all done in hardware with no host intervention; The texture memory management function can be used to manage texture storage in the host memory in addition to the texture storage in our normal texture memory; Multiple memory pools are supported; and multiple rasterizers can be supported capable.
The access times to the memory allocated (in level 1) for the working set are not all the same. For example successive accesses to the different DRAM pages from the same bank of memory will incur page break costs up to 10 times the cost of an access which doesn""t cause a page break to occur. When mip mapping, it is very common for two texture maps to be accessed simultaneously. If both texture maps (which are just adjacent levels in the mip map set) are in the same bank, but in different DRAM pages (as is very likely except for the very lowest resolution maps) then a significant number of page breaks will occur. The hardware will try to make the best of a bad map layout by grouping accesses to each map together so the page break costs can be amortised over more texels reads, but the ideal solution is to lay the textures out so that adjacent map levels are in different banks of memory. When this can be done there are no page breaks when interleaving accesses to both maps.
The virtual texture hardware can manage up to 4 pools of memory and will allocate a faulting texture page to the appropriate memory pool under control of the MemoryPool bits in the logical page table. An least-recently-used list of pages (used when deciding which page can be replaced) is maintained for each memory pool.
Thus a benefit is derived from avoiding DRAM page breaks (i.e. trying to keep the DRAMs in their page mode).
There is also another important benefit from this idea in two ways. If the texture map is mipmapped: any significant texture map will be far greater than typical cache size. Moreover, trilinear filtering implies sequential accesses in two localized areas: so with two FIFOs, correlated accesses can be grouped to optimize DRAM paging; however, this increases latency, which itself meets limits. Since an SDRAM chip is internally divided into banks (2 or 4, preferrably 4), the user can keep open as many as four pages per bank. This permits the use of, e.g., two banks for textures, one for Z, one for color. More precisely, this can be imaged as: even levels in bank zero, odd in bank one.
Now if memory allocation is handled in software, it will be cognizant of this; but since memory allocation is done in hardware, the hardware has to get the information. Each page of texture is assigned to a pool, with a strict relation forced, using a 2-bit field in the page table. (A pool is just a collection of pages to which some specific relation is assigned. The relation is hidden in how the allocation tables are built; at the start different linked lists are run for each pool.)
Note that odd and even levels are not identical in size: the highest resolution is level 0, so odd levels are smaller on average than their even counterparts.
The extra two pools are useful, e.g. for lighting in QUAKE, etc. Programmers are allowed to edit textures (using e.g. feature texsubimage in an OpenGL space) by editing the corresponding bit in host memory and telling hardware to invalidate its references to the edited spacexe2x80x94see the section on xe2x80x9cEditing Texture Mapsxe2x80x9d in the Detailed Description.
However, if one is editing textures at a high rate (e.g. with dynamic lighting in a game scenario, with a rocket roaring down a corridor), this requires burdensome synching (between host-memory updates and whatever is going on on the card); the question is how to avoid thrashing and synching. A reload command will use the same memory if the data is already present, but it isn""t known whether the data is already present.
To avoid this problem, when it is detected that a texture map is to be used for the very first time, those pages are locked down into the working set by moving these pages into one of the two spare pools. Because they are in their own special pool, they will never get thrown out. Synching is only an issue if the location of the data is unknown, but if it is known that it is on card it can be solved with a reload.
For additional information, see the section on xe2x80x9cMemory Poolsxe2x80x9d in the Detailed Description. Note also FIG. 10: the upper part of this diagram shows the organization for texture virtual memory management, and the bottom part shows the organization for texture caching.