1. Field of the Invention
This invention relates to the field of information storage and retrieval, and, more specifically, to a technique for improving cache memory performance.
Portions of the disclosure of this patent document contain material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever. ArtX, and all ArtX-based trademarks and logos are trademarks or registered trademarks of ArtX, Inc. in the United States and other countries.
2. Background Art
In a computer system there may be two types of memories, a large memory that is relatively slow, and a smaller memory called a xe2x80x9ccachexe2x80x9d memory that is relatively fast. Some portion of the larger memory is stored in the cache memory. When the computer system needs data, it first checks the cache to see if the data is already there (a cache xe2x80x9chitxe2x80x9d). If the data is there, the computer gets it quickly (because the cache is fast) and can move on to some other step. If the data is not already in the cache, (a cache xe2x80x9cmissxe2x80x9d), the computer retrieves the data from the large memory. This slows computer operation down. Prior art cache memory environments have a disadvantage of very slow operation when data is not in the cache memory. The disadvantages of prior art cache environments can be understood by reviewing computer operation, particularly with respect to the storage and display of graphical information.
Storing and Displaying Graphical Data
Computers are often used to display graphical information. In some instances, graphical data is xe2x80x9crenderedxe2x80x9d by executing instructions from an application that is drawing data to a display. A displayed image may be made up of a plurality of graphical objects. Examples of graphical objects include points, lines, polygons, and three dimensional solid objects.
If you looked closely at a television screen, computer display, magazine page, etc., you would see that an image is made up of hundreds or thousands of tiny dots, where each dot is a different color. These dots are known as picture elements, or xe2x80x9cpixelsxe2x80x9d for short when they are on a computer display and as dots when printed on a page. The color of each pixel is represented by a number value. To store an image in a computer memory, the number value of each pixel of the picture is stored. The number value represents the color and intensity of the pixel.
The accuracy with which a document can be reproduced is dependent on the xe2x80x9cresolutionxe2x80x9d of the pixels that make up the document. The resolution of a pixel is the size of the number value used to describe that pixel. The size of the number value is limited by the number of xe2x80x9cbitsxe2x80x9d in the memory available to describe each pixel (a bit is a binary number having a value of 1 or 0). The greater the number of bits available per pixel, the greater the resolution of the document. For example, when only one bit per pixel is available for storage, only two values are available for the pixel. If two bits are available, four levels of color or intensity are available. While greater resolution is desirable, it can lead to greater use of data storage. For example, if each pixel is represented by a 32-bit binary number, 320,000 bits of information would be required to represent a 100 X 100 pixel image. Such information is stored in what is referred to as a xe2x80x9cFrame Bufferxe2x80x9d (or xe2x80x9cG arrayxe2x80x9d).
Pixel Rendering and Texture Mapping
The process of converting graphics data and instructions into a display image is known as xe2x80x9cpixel rendering.xe2x80x9d During pixel rendering, color and other details can be applied to areas and surfaces of these objects using xe2x80x9ctexture mappingxe2x80x9d techniques. In texture mapping, a texture image (also referred to as a texture map, or simply as a texture) is mapped to an area or surface of a graphical object to produce a visually modified object with the added detail of the texture image. A texture image may contain, for example, an array of RGB (red, green, blue) color values, intensity values, or opacity values.
As an example of texture mapping, given a featureless graphical object in the form of a cube and a texture image defining a wood grain pattern,.the wood grain pattern of the texture image may be mapped onto one or more surfaces of the cube such that the cube appears to be made out of wood. Other examples of texture mapping include mapping of product logo texture images to computer-modeled products, or mapping of texture images containing vegetation and trees to a barren computer-modeled landscape. Textures mapped onto geometric surfaces may also be used to provide additional motion and spatial cues that surface shading alone might not be capable of providing. For example, a featureless sphere rotating about an axis appears static until an irregular texture image or pattern is mapped to its surface.
Texture mapping involves using a texture image having a function defined in texture space. Typically, the texture space is represented as a two dimensional space, with xe2x80x9cSxe2x80x9d and xe2x80x9cTxe2x80x9d indices defining orthogonal axes (e.g., horizontal and vertical). A texture image is represented in texture space as an array in S and T of discrete texture elements or values called xe2x80x9ctexels.xe2x80x9d The texture image is warped or mapped from the texture space into an image space having an array of picture elements called xe2x80x9cpixels.xe2x80x9d The pixels are associated with orthogonal axis coordinates xe2x80x9cXxe2x80x9d and xe2x80x9cYxe2x80x9d in the image space which define a viewing plane for display. Based on the particular mapping function, a correspondence is generated between pixels representing an object or primitive in the image space and texels representing a texture image in the texture space.
Typically, a two-dimensional texture or pattern image is mapped onto a two or three-dimensional surface. For a two-dimensional surface, X and Y coordinates may be sufficient for defining a mapping function between pixels forming the surface and texels forming the texture image. For a three-dimensional surface, a perspective coordinate or other depth cueing mechanism may be provided to indicate distance from the viewing plane defined by the X and Y axes. The perspective coordinate may then be applied to the mapping function. For example, as the perspective coordinate value for a surface region increases (i.e., the surface region is further from the viewing plane), the mapping of the texture image may be darkened and/or compressed (i.e., neighboring pixels in the surface region may span an increased number of texels in the texture image), or otherwise warped, relative to surface regions having a lower perspective coordinate value. Through the application of depth cueing, the viewer is provided with a sense of distance or depth when viewing the rendered pixels.
FIGS 1A-1C illustrate a mapping of a brick-pattern texture image in texture space to a triangle primitive (100) in image space. FIG. 1A illustrates triangle primitive 100 in image space prior to texture mapping. FIG. 1B illustrates the brick-pattern texture image in texture space. FIGS. 1C illustrates triangle primitive 100 in image space after texture mapping has completed.
In FIG. 1A, triangle primitive 100 is defined by vertices at X,Y coordinate pixel locations PA(XA,YA), PB(XB,YB) and PC(XC,YC), where X is the horizontal axis and Y is the vertical axis. Pixels defining the perimeter of triangle primitive 100 may be explicitly stored in memory, or, to reduce storage requirements for individual primitives, the perimeter pixels may be linearly interpolated from the vertices defined by pixel locations PA(XA,YA), PB(XB,YB) and PC(XC,YC). The interior of triangle primitive 100 is formed by those pixel locations that lie within the defined perimeter. The surface of triangle primitive 100 in this example comprises the union of the perimeter pixels (including the vertices) and the interior pixels.
In FIG. 1B, a brick pattern is stored as a texture image referenced by S and T coordinates. The brick pattern may be mapped to pixels in image space by accessing texels at integer S and T coordinates. In accordance with a particular mapping function, pixel vertices PA, PB and PC of the triangle primitive correspond to S and T coordinates (SA,TA), (SB,TB) and (SC,TC), respectively. The orientation of the mapped vertices indicates rotation and scaling of triangle primitive 100 with respect to the S and T texture space.
FIG. 1C shows triangle primitive 100 having pixel vertices with corresponding X,Y coordinates for the image space, as well as texture space S,T coordinates for extracting texel values. The pixel vertices are PA(XA,YA; SA,TA), PB(XB,YB; SB,TB) and PC(XC,YC; SC,TC). The brick pattern of the texture image of FIG. 1B appears within triangle primitive 100 at slightly reduced scale, and at an approximately forty-five degree rotational offset from the X-axis. Other texture images may be similarly texture mapped to surfaces in render operations.
As the pixels defining a surface are rendered, S and T coordinate values are generated for each pixel based on the mapping function. The generated S and T coordinate values are then used to obtain a texel value for each rendered pixel in the image space. However, the generated S and T coordinate values are generally fractional values (i.e., not integer values). Consequently, the generated S and T coordinate values often correspond to a location in the texture space that falls between the texels of the texture image array.
Several options exist for selecting a texture value, given real S and T coordinate values. One of the simplest options is to round the S and T coordinate values to the nearest integers, and then select the texel corresponding to the rounded integer coordinate values. A more accurate representation is produced by interpolating between the four nearest samples that surround the real (S,T) location. For example, a bilinear interpolation algorithm (i.e., bilinear filtering), or higher-order interpolation algorithm, may be used to interpolate texel values for fractional S and T coordinates. Bilinear interpolation is illustrated in FIG. 2.
In FIG. 2, a pixel PN is mapped to S and T coordinates (L+xcex1,M+xcex2). The four nearest texels in texture space are TXL(L,M), TXL(L+1,M), TXL(L,M+1) and TXL(L+1,M+1). To perform bilinear interpolation (or filtering), a linear interpolation is performed between the texel pairs [TXL(L,M), TXL(L+1,M)] and [TXL(L,M+1), TXL(L+1,M+1)] to determine intermediate pixel values PNxe2x80x2(L+xcex1,M) and PNxe2x80x3(L+xcex1,M+1), respectively. These linear interpolation functions are performed to implement equations (1) and (2) below.
Pxe2x80x2N(L+xcex1,M)=(1xe2x88x92xcex1)TXL(L,M)+xcex1TXL(L+1,M)xe2x80x83xe2x80x83(1)
Pxe2x80x3N(L+xcex1,M+1)=(1xe2x88x92xcex1)TXL(L,M+1)+xcex1TXL(L+1,M+1)xe2x80x83xe2x80x83(2)
A third linear interpolation operation is performed on intermediate pixel values Pxe2x80x2N(L+xcex1,M) and Pxe2x80x3N(L+xcex1,M+1) to obtain PN(L+xcex1,M+xcex2) in accordance with the following equation (3). The linear interpolation operations for the intermediate pixels may be performed along the opposite axis as well, or the linear interpolation operations may be combined to implement a form of equation (4) below.                                                                                           P                  N                                ⁡                                  (                                                            L                      +                      α                                        ,                                          M                      +                      β                                                        )                                            =                              xe2x80x83                            ⁢                                                                    (                                          1                      -                      β                                        )                                    ⁢                                                            P                      N                      xe2x80x2                                        ⁡                                          (                                                                        L                          +                          α                                                ,                        M                                            )                                                                      +                                                                                                        xe2x80x83                            ⁢                              β                ⁢                                  xe2x80x83                                ⁢                                                      P                    N                    xe2x80x3                                    ⁡                                      (                                                                  L                        +                        α                                            ,                                              M                        +                        1                                                              )                                                                                                          (        3        )                                                                                    xe2x80x83                            ⁢                              =                                  xe2x80x83                                ⁢                                  [                                      1                    -                                                                  (                                                  α                          +                          β                                                )                                            ⁢                                              TXL                        ⁡                                                  (                                                      L                            ,                            M                                                    )                                                                                      +                                                                                                                                          xe2x80x83                            ⁢                              [                                                                            α                      ⁡                                              (                                                  1                          -                          β                                                )                                                              ⁢                                          TXL                      ⁡                                              (                                                                              L                            +                            1                                                    ,                          M                                                )                                                                              +                                                                                                                        xe2x80x83                            ⁢                              [                                                                            β                      ⁡                                              (                                                  1                          -                          α                                                )                                                              ⁢                                          TXL                      ⁡                                              (                                                  L                          ,                                                      M                            +                            1                                                                          )                                                                              +                                                                                                                        xe2x80x83                            ⁢                              α                ⁢                                  xe2x80x83                                ⁢                β                ⁢                                  xe2x80x83                                ⁢                                  TXL                  ⁡                                      (                                                                  L                        +                        1                                            ,                                              M                        +                        1                                                              )                                                                                                          (        4        )            
The above equations (1)-(4) may be implemented in any equivalent form, or the equations may be approximated, to optimize the calculation apparatus for speed and/or complexity.
Using the texel selection processes described above, severe aliasing of the texture may occur if the surface being texture-mapped is far from the viewing plane. This aliasing is caused when the reduced pixel resolution provides insufficient sampling of texture images that have higher frequency components (e.g., fast transitioning color or intensity values). The interpolated (S,T) values may skip over large areas of the texture. A technique known as MIP-mapping is often performed to prevent aliasing by precomputing multiple, filtered copies of the texture at successively lower resolutions. For example, a texture image comprising a 256xc3x97256 texel array would be filtered and resampled to obtain further texel arrays (or maps) at 128xc3x97128, 64xc3x9764, 32xc3x9732, 16xc3x9716, 8xc3x978, 4xc3x974, and 2xc3x972 resolutions. The cost of storing the additional texel arrays is an increase of approximately thirty percent in memory size.
The particular size of the texel array that is used during pixel rendering is chosen based on a computer parameter known as the xe2x80x9clevel of detail.xe2x80x9d The level of detail represents the relative distance between the interpolated S and T values. Each texel array size represents an integer level of detail, and the computed level of detail values are real numbers. High quality texture mapping is obtained by performing bilinear interpolation in the texel array representing the integer level of detail immediately above and below the computed level of detail of each pixel. Next, a linear interpolation is performed between the integer levels of detail to obtain the texture value at the non-integer level of detail. This process is known as trilinear MIP-mapping.
Texture Map Storage
To facilitate texture mapping, a texture image may be stored in a dynamic random access memory (DRAM) device. The texel values of the texture image are accessed from the DRAM as needed to determine pixel values for a rendered image in the frame buffer. Unfortunately, DRAM devices are inefficient when performing data transfer operations (e.g., data reads) for individual data values. Peak efficiency is achieved when transferring multiple data values, especially data values that are in adjacent memory locations. For example, for a burst transfer of data in consecutive locations, a DRAM device may support a transfer rate of eight bytes per clock cycle. The same DRAM device may have a transfer rate of one byte per nine clock cycles for arbitrary single byte transfers. These performance characteristics are not well-suited to texture mapping.
In texture mapping operations, pixel values for a frame buffer are often determined in a particular scan order, such as by scanning in the direction of the X axis. However, texels associated with consecutive pixels are rarely in a predictable scan order with respect to texture space. For example, in the texture mapping process of FIGS 1A-1C, a scan along the X axis in image space results in a scan pattern in the texture space that includes multiple passes from the left edge (T axis) of the texture image towards the upper right of the texture image. FIG. 3 illustrates the scan direction in texture space for the texture mapping of FIGS. 1A-1C. As shown, each scan arrow represents texel accesses that frequently traverse, or xe2x80x9cskipxe2x80x9d, rows, including large skips between the ending of one scan arrow and the beginning of the next scan arrow based on the boundaries defined by the primitive in image space.
For a linearly configured DRAM, for example, because the texels in a texture image are not typically scanned in a linear path along the S axis, consecutive pixels will access texels that are widely dispersed across memory. For a 1024xc3x971024 texture image in which each texel is one byte wide, a traversal of one integer T coordinate may translate to a skip of 1024 bytes in DRAM. These memory skips are not easily predictable because the skips are dependent upon the size of the image, the width of a texel, the rotational angle between the S,T axes and the X,Y axes, etc. Texture mapping may also be nonlinear for irregular surfaces, further dispersing memory access operations.
FIG. 4 illustrates an example pixel scan line progressing through a portion of a texel array. The texel array shown encompasses the range [L, L+5] in the S direction and [M,M+4] in the T direction. The pixels that form the scan line comprise PN, PN+1, PN+2, PN+3, PN+4, PN+5 and PN+6. PN lies within the texel neighborhood formed by texels at (L,M+1), (L+1,M+1), (L,M+2) and (L+1,M+2). Pixel PN+1 has a texel neighborhood of texels at (L+1,M), (L+2,M), (L+1,M+1), and (L+2,M+1). Each of pixels PN+2, PN+3, PN+4, PN+5 and PN+6 have a similar texel neighborhood. These texels may be used to determine the nearest neighbor for approximating the desired texel value. Also, as described above, interpolation may be performed on the texel neighborhood of each pixel. Assuming a linear memory in S, and memory access of a texel neighborhood in the order of (top-left, top-right, bottom-left, bottom-right), the memory transfers for the texels associated with pixels PN, PN+1, PN+2, PN+3, PN+4, PN+5 and PN+6 may occur as shown in the following table (where the texture image size is W(width)xc3x97H(height), and the base address xe2x80x9cBxe2x80x9d of the texel array is at (L,M)):
Associated with each of the pixels above (PN-PN+6) is a skip in the DRAM texel access of approximately the width of the texture image which is caused by the two-dimensional nature of the texel neighborhood. Even larger skips are introduced when the scan pattern crosses multiple integer coordinates in T for consecutive pixels. The speed of the texture mapping process may be significantly reduced by the performance of DRAM data transfers with frequent address skips of this nature.
Prior Art Buffer Memory
Prior art texture mapping schemes attempt to overcome the limitations of DRAM data transfer characteristics by using a smaller, faster buffer memory to hold data between DRAM transfers. Buffering consists of loading a block of contiguous data into buffer memory for use by the processor performing the texel processing. A new block of data is loaded from DRAM into the buffer when it is needed.
FIG. 5 illustrates buffering applied to a texture image. In FIG. 5, texture image 500 has dimensions Wxc3x97H, and base address 502. An N-byte buffer is used to hold N-byte buffered block 501 of texture image data having starting address M. Buffered block 501 is loaded as a linear block of texture image data from memory, or as a multidimensional tile of contiguous texture image data.
Buffering apparatus are illustrated in FIG. 6. DRAM 600 is coupled to N-byte buffer 601 to receive address information 604, and to transfer read data block 603 into the buffer memory. The data transferred from DRAM 600 comprises DRAM locations M through M+N, where M is supplied as read address 604. The N-byte contiguous block of data in buffer 601 is available via bus 605 for texel processing component 602 to access the texture image data in buffer 601. The texture image data is used to produce output 606, such as rendered pixels for display. When the texture image data required by the texel processing component 602 is not located in buffer 601, a new contiguous buffered block of texture image data is retrieved from DRAM 600 and placed in buffer 601.
Rather than performing the transfer of data as a single block to the buffer, the data may be streamed through the buffer, for example, in a FIFO (first in, first out) arrangement. The streaming data has an accessibility lifetime in the buffer based on the time required to shift a data element through the buffer and out. This lifetime is directly related to the size of the buffer itself.
If the buffered block 501 is configured as a contiguous one-dimensional (or linear) block of data, the buffered data is strongly biased along the S axis direction. Therefore, for two-dimensional graphics applications such as texel processing, buffered block 501 in buffer 601 requires frequent transfers from DRAM 600 to track texels when scan patterns produced by a particular mapping have a strong T component that causes frequent skips. Any performance gain achieved by storing a contiguous memory block in buffer 601 is countered by the need to make frequent data transfers of different blocks from DRAM 600 to buffer 601. Due to the contiguous nature of a buffer, a buffer needs to be very large to encompass large skips within the buffer, particularly for large images.
In the prior art, U.S. Pat. No. 5,548,709, issued to Hannah et al. on Aug. 20, 1996, discloses a semiconductor chip, referred to as TRAM (texture random access memory), that integrates texture memory, interpolation and resampling logic on the same substrate. Textures are input to the chip and stored in a main memory. The interpolator produces an output texel by interpolating from the textures stored in memory.
The invention provides a method of operating a cache memory so that operation is optimized. Instead of fetching data immediately upon a cache miss, the present invention continues with subsequent cache accesses. Decoupled from cache access, cache misses are fetched to cache. During operation, for each request in a sequence of data requests, it is determined if the requested data can be found in cache memory. If the data is not found in the cache, the next request in the sequence is processed without first retrieving the data pending from the earlier request. A miss list is generated for each of the requests in the sequence of requests whose data is not found in the cache. The data that is associated with the requests in the miss list is obtained from DRAM and used to satisfy the requests. Some cache lines may have one or more pending hits to data associated with the cache line. Those requests are kept in a pending hits list and processed in order as required. There may also be pending misses kept in a pending misses list where the list contains one or more pending misses to data associated with the cache line. A flag or indicator is set for a cache line when there are misses associated with the cache line.