1. Field of the Invention
This invention relates generally to the field of computer graphics and, more particularly, to a high performance graphics system which implements texture mapping.
2. Description of the Related Art
A computer system typically relies upon its graphics system for producing visual output on the computer screen or display device. Early graphics systems were only responsible for taking what the processor produced as output and displaying that output on the screen. In essence, they acted as simple translators or interfaces. Modem graphics systems, however, incorporate graphics processors with a great deal of processing power. They now act more like coprocessors rather than simple translators. This change is due to the recent increase in both the complexity and amount of data being sent to the display device. For example, modem computer displays have many more pixels, greater color depth, and are able to display images that are more complex with higher refresh rates than earlier models. Similarly, the images displayed are now more complex and may involve advanced techniques such as anti-aliasing and texture mapping.
As a result, without considerable processing power in the graphics system, the CPU would spend a great deal of time performing graphics calculations. This could rob the computer system of the processing power needed for performing other tasks associated with program execution and thereby dramatically reduce overall system performance. With a powerful graphics system, however, when the CPU is instructed to draw a box on the screen, the CPU is freed from having to compute the position and color of each pixel. Instead, the CPU may send a request to the video card stating: xe2x80x9cdraw a box at these coordinatesxe2x80x9d. The graphics system then draws the box, freeing the processor to perform other tasks.
Generally, a graphics system in a computer (also referred to as a graphics system) is a type of video adapter that contains its own processor to boost performance levels. These processors are specialized for computing graphical transformations, so they tend to achieve better results than the general-purpose CPU used by the computer system. In addition, they free up the computer""s CPU to execute other commands while the graphics system is handling graphics computations. The popularity of graphical applications, and especially multimedia applications, has made high performance graphics systems a common feature of computer systems. Most computer manufacturers now bundle a high performance graphics system with their systems.
Since graphics systems typically perform only a limited set of functions, they may be customized and therefore far more efficient at graphics operations than the computer""s general-purpose central processor. While early graphics systems were limited to performing two-dimensional (2D) graphics, their functionality has increased to support three-dimensional (3D) wire-frame graphics, 3D solids, and now includes support for three-dimensional (3D) graphics with textures and special effects such as advanced shading, fogging, alpha-blending, and specular highlighting.
While the number of pixels is an important factor in determining graphics system performance, another factor of equal import is the quality of the image. Various methods are used to improve the quality of images, including anti-aliasing, alpha blending, and fogging, among numerous others. While various techniques may be used to improve the appearance of computer graphics images, they also have certain limitations. In particular, they may introduce their own aberrations and are typically limited by the density of pixels displayed on the display device.
As a result, a graphics system is desired which is capable of utilizing increased performance levels to increase not only the number of pixels rendered but also the quality of the image rendered. In addition, a graphics system is desired which is capable of utilizing increases in processing power to improve graphics effects.
Therefore, a graphics system may be configured to receive a stream of vertices from a host application executing on a host computer. The vertices specify triangles in a 3D coordinate space. The triangles represent a collection of 3D objects in the 3D world coordinate space. The graphics system may operate on the triangles to generate a video stream which represents the view of a virtual camera (or virtual observer) in the 3D world coordinate space. In particular, the graphics system may compute color values for each pixel that resides within each triangle (i.e. the two-dimensional footprint of the triangle in screen space). This process of assigning color values to pixels (or samples) internal to triangles may be referred to herein as triangle rasterization.
To obtain images that are more realistic, some prior art graphics systems have implemented a process referred to as texture mapping. Thus, triangle rasterization may include the application of one or more textures. In other words, graphics system may store one or more texture maps in a texture memory and may modify the color of pixels using the one or more texture maps. For example, pixels residing internal to a given triangle comprising part of a wall may be textured with a texture map, e.g., a texture map which gives the triangle the appearance of brick material.
In a graphics application, accesses of graphics data, such as texture map data, must be performed very quickly. Therefore, one goal of a graphics system is to improve the speed and efficiency of memory accesses of texture maps from a texture memory. One common method is to use a texture memory cache to improve the speed of accesses of texture maps from the texture memory. In general, accesses to texture maps exhibit considerable spatial locality. However, reliance on cache memories and caching techniques to take advantage of this spatial locality often results in fetching more data from memory than is required. This wastes memory bandwidth. In addition, in many instances the requested array of texels cannot be accessed from the texture buffer in a single read transaction. This results in access latencies which adversely affect system performance.
Therefore, an improved system and method is desired for optimally storing texture maps in a multi-memory system to guarantee access to all of the data in a single read transaction and/or with reduced or no over-fetching.
One embodiment of the invention comprises a graphics system and method for storing and accessing texture maps. The graphics system may include a plurality of memory devices for storing the texture maps. A graphics processor may couple to the plurality of memory devices to store texture maps in the memory devices and/or access texture maps from the memory devices.
In one embodiment, a texture map comprises a plurality of texels, wherein an Nxc3x97M array of texels corresponds to each pixel. For example, each Nxc3x97M array of texels may comprise a 2xc3x972 array of texels. Neighboring pixels may share one or more texels. In other words, for two neighboring pixels, their respective Nxc3x97M arrays of texels may include one or more common texels. Where a 2xc3x972 array of texels corresponds to each pixel, then in one embodiment neighboring pixels share at least two texels.
At least portions of the texels may be stored in respective ones of the plurality of memory devices, wherein neighboring (or spatially adjacent) texels are stored in separate ones of the memory devices in an interleaved fashion. Due to the interleaved nature of the storage of the texels, for each of the plurality of pixels, the plurality of memory devices are operable to output a respective Nxc3x97M array of texels for a respective pixel in parallel in a single read transaction. In other words, the texel data is interleaved in the plurality of memory devices to guarantee that, no matter which Nxc3x97M array of texels is accessed, each texel in the array is present in a different memory device or chip and hence are concurrently available. Thus the Nxc3x97M array of texels may be output concurrently or simultaneously, regardless of which array is accessed, i.e., regardless of which pixel is addressed. The Nxc3x97M array of texels may also be provided without over-fetching of non-requested texels.
When a read transaction is generated for an Nxc3x97M array of texels for a respective pixel, the plurality of memory devices output the Nxc3x97M array of texels for the respective pixel in parallel (concurrently) in a single cycle in response to the read transaction. Where the texels are addressed using a U,V addressing scheme, the texels in each 2xc3x972 array of texels may have addresses U, V; U+1, V; U, V+1; and U+1, V+1. In other words, for any U,V address, a requesting device, such as the graphics processor, can access any Nxc3x97M array of texels having addresses U,V; U+1, V; U, V+1; and U+1, V+1. The plurality of memory devices are operable to output a respective Nxc3x97M array of texels having addresses U,V; U+1, V; U, V+1; and U+1, V+1; for a respective pixel in parallel in a single read transaction.
In one embodiment, each texel comprises a first portion and a second portion, e.g., a first half and a second half. The first and second portions of a respective texel may be stored in separate memory devices comprising a pair of memory devices to allow the respective texel to be output from the pair of memory devices in a single read transaction. In other words, the plurality of memory devices may comprise a plurality of pairs of memory devices, wherein each of the pairs of memory devices comprises a first memory device and a second memory device, and wherein each pair of memory devices is independently addressable. A first portion of each texel may be stored in a first memory device of a pair of memory devices and a second portion of each texel may be stored in a second memory device of the pair of memory devices. Each respective pair of memory devices may be operable to output a texel in response to the respective pair of memory devices receiving a single address.
For example, in a first embodiment, where each Nxc3x97M array of texels comprises a 2xc3x972 array of texels, each texel is 32 bits, the plurality of memory devices comprise 8 memory devices, and the memory devices are 16 bit memories, then each of the memory devices stores 16 bits (one half) of a respective texel. Thus, 16 bits of each texel may be stored in one memory device of a pair, and the other 16 bits may be stored in another memory device of the pair. A texel may be stored in and accessed from a respective memory device pair as described above.
In a second embodiment, for each of the plurality of pixels, the plurality of memory devices are operable to output a respective Nxc3x97M array of texels for at least two respective neighboring pixels in parallel in response to a single read transaction. For example, in one exemplary embodiment, where each Nxc3x97M array of texels comprises a 2xc3x972 array of texels, each texel is 16 bits, the plurality of memory devices comprise 8 memory devices, and the memory devices are 16 bit memories, then each of the memory devices stores a respective 16 bit texel. In this exemplary embodiment, for each of the plurality of pixels, the plurality of memory devices are operable to output a respective 2xc3x972 array of texels for each of two respective neighboring pixels in parallel in a single read transaction.
In a third embodiment, where each texture map is a 3-D texture map, an Nxc3x97Mxc3x97O array (3-D array) of texels corresponds to each pixel. In this embodiment, for each of the plurality of pixels, the plurality of memory devices are operable to output a respective Nxc3x97Mxc3x97O array of texels for a respective pixel in parallel in a single read transaction. For example, in one exemplary embodiment, where each texture map is a 3-D texture map, a 2xc3x972xc3x972 array of texels corresponds to each pixel, each texel is 16 bits, the plurality of memory devices comprise 8 memory devices, and the memory devices are 16 bit memories, then each of the memory devices stores a respective 16 bit texel. In this exemplary embodiment, for each of the plurality of pixels, the plurality of memory devices are operable to output a respective 2xc3x972xc3x972 array of texels for a respective pixel in parallel in a single read transaction.
In one embodiment, each texture map comprises a body portion and a border portion. The body portion of the texture map (without borders) may have a size that is a power of 2 in each dimension. Thus, for a 2D texture map, the width and height of the texture map is a power of 2 in size. For a 3D texture map, the width, height, and length of the texture map is a power of 2 in size. Since memory pages are a power of 2 in size, the body portion of the texture map may be efficiently stored in memory. The border portion of the texture map may be a 1 texel-wide strip around the body portion of the texture map. Since the body portion of the texture map is a power of 2 in size, the body portion plus the border portion is not a power of 2 in size. Thus, storage of the body and border portions of the texture map would be inefficient.
In one embodiment of the invention, the body and border portions of the texture map are stored in separate areas of memory (e.g., in different memory address spaces). However, the texel interleaving described above is preserved, whereby neighboring texels in the body and border portions may be stored in separate ones of the memory devices in an interleaved fashion. Thus, for respective pixels which include texels from both the body portion and border portion of the texture map, the plurality of memory devices are operable to output a respective Nxc3x97M array of texels for the respective pixel in parallel in a single read transaction. In other words, the texel data is interleaved in the plurality of memory devices to guarantee that, no matter which Nxc3x97M array of texels is accessed, (even if the array requires some body texels and some border texels), each texel in the array is present in a different memory device or chip and hence all of the texels in the array are concurrently available. The upper address bits of texels in the border portion may not be interleaved to allow for efficient packing.