This invention relates generally to computer memory systems. More particularly, the invention relates to methods and apparatus for enhancing memory access performance. The invention has particularly beneficial application with regard to frame buffer memories in computer graphics systems.
Frame buffer memories and the bandwidth problem. A frame buffer memory is typically used in a computer graphics system to store all of the color information necessary to control the appearance of each pixel on a display device. Color information is usually stored in terms of RGBA components (a red intensity component, a green intensity component, a blue intensity component, and an xe2x80x9calphaxe2x80x9d transparency value). In addition, the frame buffer memory is often used to store non-color information that is accessed during the rendering and modification of images. For example, xe2x80x9cZxe2x80x9d or xe2x80x9cdepthxe2x80x9d values may be stored in the frame buffer memory to represent the distance of pixels from the viewpoint, and stencil values may be stored in the frame buffer memory to restrict drawing to certain areas of the screen. In operation, upstream graphics hardware issues a stream of read and write commands with accompanying addresses directed to the frame buffer memory. In turn, a frame buffer memory controller receives the command stream and responds to each command by operating the memory devices that make up the frame buffer memory itself. Depending on the rendering modes enabled at any given time, a single frame buffer memory access command issued by upstream hardware may result in numerous accesses to the frame buffer memory by the frame buffer memory controller. For further background regarding frame buffer memories and their uses, see James D. Foley et al., Computer Graphics: Principles and Practice chapter 18 (2d ed., Addison-Wesley 1990) and Mason Woo et al., OpenGL Programming Guide chapter 10 (2d ed., Addison-Wesley 1997).
Over time, the resolution capabilities of display devices have increased, and consequently so has the amount of information (both color and non-color) that must be stored in the frame buffer memory. In addition, refresh cycles of display devices have become shorter. The result has been that access rates for modern frame buffer memories have become extremely high. Due to cost, the vast majority of frame buffer memories are constructed using dynamic random access memories (xe2x80x9cDRAMsxe2x80x9d) instead of static random access memories (xe2x80x9cSRAMsxe2x80x9d) or specially-ported video random access memories (xe2x80x9cVRAMsxe2x80x9d). Unfortunately, DRAMs present certain performance problems related to, for example, the need to activate and deactivate pages, and the need to refresh storage locations regularly. Although DRAM memory device clock frequencies have increased over time, their latency characteristics have not improved so dramatically. Thus, numerous techniques have been proposed to increase DRAM frame buffer memory bandwidth.
Memory devices: banks, bursts, SDR and DDR. One technique that has been employed to increase DRAM frame buffer memory bandwidth has been to divide the memory devices internally into independently-operating banks, each bank having its own set of row (page) and column addresses. The use of independent banks improves memory bandwidth because, to the extent bank accesses can be interleaved with proper memory mapping, a row in one bank can be activated or precharged while a row in a different bank is being accessed. When this is possible, the wait time required for row activation and precharge may be concealed so that it does not negatively impact memory bandwidth.
Another technique has been to employ memory devices that support burst cycles. In a burst memory cycle, multiple words of data (each corresponding to a different but sequential address) are transferred into or out of the memory even though only a single address was specified at the beginning of the burst. The memory device itself increments or decrements the addresses appropriately during the burst based on the initially specified address. Burst operation increases memory bandwidth because it creates xe2x80x9cfreexe2x80x9d command cycles during the burst that otherwise would have been occupied by the specification of sequential addresses. The free command cycles so created may be used, for example, to precharge and activate rows in other banks in preparation for future memory accesses.
In a single-data-rate (xe2x80x9cSDRxe2x80x9d) memory device, data may be transferred only once per clock cycle. A double-data-rate (xe2x80x9cDDRxe2x80x9d) memory device, on the other hand, is capable of transferring data on both phases of the clock. Both SDR and DDR devices are capable of burst-mode memory accesses. For SDR devices, the minimum burst length that can create a free command cycle is two consecutive words (column addresses). The absolute minimum burst length for SDR devices is one word (column address). An example of an SDR device is the NEC uPD4564323 synchronous DRAM, which is capable of storing 64 Mbits organized as 524,288 wordsxc3x9732 bitsxc3x974 banks. For double-data-rate devices, the minimum burst length that can create a free command cycle is four consecutive words (column addresses). The absolute minimum burst length for DDR devices is two consecutive words (column addresses). An example of a DDR device is the SAMSUNG KM416H430T hyper synchronous DRAM, which is capable of storing 64 Mbits organized as 1,048,576 wordsxc3x9716 bitsxc3x974 banks.
The problem of column coherency in a graphics command stream. In order to capitalize on the burst-mode capabilities of frame buffer memory devices, prior art graphics systems depended on the natural occurrence of sequential column addresses in the various streams of read and write commands issued by upstream hardware. For example, with coherent triangle rendering and appropriate mapping of x,y screen space to RAM address space, many pairs of sequential column addresses could be made to occur naturally in the stream of pixel commands requested by a rasterizer. Indeed, such a solution worked adequately in times when DDR memory devices were not available.
Now, however, DDR memory devices are often used to construct the frame buffer memory. For prior art systems to capitalize on the burst-mode capabilities of a DDR device, a substantial number of quadruplets of sequential column addresses would have to occur naturally in the command stream; but the natural production of a substantial number of quadruplets of sequential column addresses is difficult if not impossible to achieve with mere memory mapping. This is especially true now that graphics applications are capable of drawing smaller triangles (having fewer pixels per triangle) than did the applications of the past.
A need therefore exists for a technique that creates opportunities for burst-mode frame buffer memory accesses when sequential column addresses do not occur naturally in the command stream.
The problem of page coherency in a graphics command stream. Changing from one row to another row in the same bank of a memory device (also known as a same-bank page change) requires wait time for closing the previous page and activating the new page. Prior art graphics systems employed two techniques in attempting to avoid this performance penalty. First, the mapping of x,y screen space to RAM address space was constructed so as to make same-bank page changes occur as infrequently as possible. Second, memory access commands were sorted into FIFO buffers according to bank: Specifically, two FIFOs per memory device bank were employed so that access commands directed to the same bank of a memory device could be further sorted according to page. Of course, if only two FIFOs per bank are employed in this manner, then grouping is only possible for up to two different pages within a single bank. If a memory access command appeared in the command stream directed to a third page within the bank, then one of the FIFOs would have to be flushed. Adding more FIFOs per bank in such a system might provide added efficiency because it would allow page-wise grouping for more than two of the bank""s pages at one time. On the other hand, such a solution would be expensive because of the number of FIFOs required to implement it, particularly in the case of the newer 4-bank memory devices. Moreover, the solution would be wasteful because the FIFOs so provided would rarely all be full at the same time.
Batching and the problem of pixel collisions. Changing from read mode to write mode presents another kind of memory performance penalty because it requires memory dead cycles. In part for this reason, prior art graphics systems have attempted to group as many read operations together as possible before transitioning to write operations, rather than to freely interleave writes with reads when it is not necessary to do so. Such a grouping of memory access commands together is known as xe2x80x9cbatching.xe2x80x9d As alluded to above, in certain rendering modes one frame buffer memory access command issued by upstream hardware may result in numerous frame buffer accesses by the frame buffer controller. For example, in image read-modify-write mode with z test enabled, one frame buffer memory write command may result in four frame buffer accesses: a z buffer read, a z buffer write, an image buffer read, and an image buffer write. Thus, prior art systems have also attempted to batch as many z reads together as possible, as many z writes together as possible, as many image reads together as possible, and as many image writes together as possible.
Such prior art batching systems yielded memory bandwidth efficiencies to the extent that they decreased the frequency of read-to-write transitions and changes from one buffer to another. However, they suffered from at least the following limitation: accesses to the same pixel location had to be placed in separate batches; otherwise the result would be a xe2x80x9cpixel collision.xe2x80x9d This meant that, depending on the vagaries of the command stream, a developing batch might have to be cut short simply because a second access to the same pixel location occurred within a relative few commands from the first access to that pixel location. The result was a decreased average batch size. This problem is even greater in modern graphics systems because modern applications utilize greater depth complexity. Thus, pixel collisions occur more frequently than in the past.
In one aspect, a specially-designed buffer facilitates reordering of memory access commands in a memory access command stream so as to create column coherencies that may be exploited with burst-mode memory cycles. A bus receives memory access commands that include data and a column address. A multi-column data storage buffer is provided. Storage control circuitry stores data associated with a memory access command into the multi-column data storage buffer at a column that corresponds to at least one of the LSBs of the column address associated with the memory access command. Flush control circuitry flushes the data storage buffer, when required, in column order.
In another aspect, each entry in the data storage buffer is associated with a unique valid bit. At flush time, the flush control circuitry analyzes the valid bits to determine an appropriate burst type for executing the memory access commands represented by the flushed buffer contents. The flush control circuitry may indicate the determined burst type to memory controller hardware by means of a burst type flag.
In another aspect, the data storage buffer may include multiple lines. Each line is associated with one line in a multi-line column address storage buffer. The storage control circuitry stores at least some of the MSBs of the data column addresses in the column address storage buffer. Column address comparison and select circuitry determines whether at least some of the MSBs of the column address of an incoming memory access command match those currently stored in a line of the column address storage buffer, and if so, select the matching line of the data storage buffer, but if not, select an unused line of the data storage buffer. In still another aspect, a row/bank address storage buffer may be provided. Lines in the data storage buffer and in the column address storage buffer may be dynamically associated with row/bank addresses stored in the row/bank address storage buffer.