In computer systems, memory access time is an important factor affecting overall performance. Memory access time is affected by, among other things, the inherent access time of the array of memory devices used in the memory system, and also by queuing delays, i.e., delays that result when a memory request is forced to wait for access to the memory array. Both of these types of delay tend to increase as the size and complexity of the memory system increases, with a concomitant decrease in performance.
It has been known to employ buffers or caches in order to reduce delays associated with memory requests. Such devices are placed between the memory system and the requestor, which may be for example a central processing unit (CPU). These buffer devices tend to be significantly faster than the memory array, and therefore they reduce the delay experienced by memory requests that they service. Most higher-performance computer systems therefore employ both a large DRAM memory array and a smaller cache or buffer situated between the memory array and one or more requestors.
While such an intermediate buffer can therefore contribute to greater performance, it also introduces more complexity into the data path of the computer system. This is because the memory array must be able to supply data to both the requestor and the buffer, and the requestor must generally be able to accept data from either the memory array or the buffer. If the data path is not carefully designed, it may contain bottlenecks that unnecessarily limit memory system performance. For example, if there were only one data port for data into and out of the buffer, it would be impossible to fill the buffer from the memory at the same time that a requestor is obtaining data from the buffer. If the rest of the system were capable of such concurrent operation, the data path would be a bottleneck acting to limit performance. It is generally desirable to eliminate such bottlenecks in the interest of achieving maximum memory system performance.
While data path concerns are common to systems employing any type of intermediate buffer as described above, other concerns arise when the buffer is of a particular type. One special type of intermediate buffer is known as a stream buffer. A stream buffer is designed to improve the average access time of a stream of sequential memory accesses. It does this in part by prefetching data from the memory array that sequentially follows requested data, storing the prefetched data into the stream buffer, and providing the prefetched data to the requestor from the stream buffer it is subsequently requested. The buffer also operates in first-in-first-out (FIFO) fashion, so that newly-prefetched data can be stored in the stream buffer as soon as previously-prefetched data is used.
One general concern with stream buffers is achieving the optimum amount of prefetching. If insufficient data is prefetched during a string of sequential memory requests, the stream buffer cannot service as many requests as it otherwise could, and so its beneficial impact on memory access time is diminished or lost. Conversely, if too much prefetching is performed, the likelihood that the prefetched data will be used diminishes. In such a case, the memory bandwidth devoted to the excess prefetching would be better spent on servicing requests for data that is actually needed.
To achieve optimum prefetching, it has been known to use a history buffer to improve the chances that prefetched data will actually be used. At any time, the history buffer retains the addresses of one or more of the most recent memory requests. When a new request occurs, its address is compared with the contents of the history buffer to determine whether a pattern of sequential accesses is occurring. If the address is sequential to any of the addresses in the history buffer, the stream buffer begins prefetching starting at the next sequential address. This checking increases the likelihood that prefetched data will later be requested as part of the same stream of sequential memory accesses.
Another technique that has been used to optimize prefetching finds application in memory systems that are interleaved. In interleaved memory systems, each memory array contains only a portion of the entire memory contents, and that portion is interleaved with the portions contained by all the other memory arrays in the memory system. For example, each array in a 4-way interleaved memory system holds one-fourth of the entire memory contents, and services requests only for every fourth data element in the memory. In such an interleaved system, a memory access to one array can often be hidden underneath sequential accesses to the other arrays. As a result, an interleaved memory system generally does not benefit from the same degree of prefetching as does a non-interleaved system. Accordingly, it has been known to limit the amount of data that is prefetched so that it is generally inversely proportional to the degree of interleaving. Such a scheme reduces the likelihood of excessive prefetching, and therefore improves the use of available memory bandwidth.
Another factor that influences the performance of memory systems, including those with stream buffers, is the manner in which refreshing of the DRAM array is performed. Refresh is the means by which weakly-held charges that represent data in the array are periodically restored, so that the data is not lost. While refresh is clearly a necessary function, it nonetheless renders the array unavailable for normal accesses when it is being performed, and thus can have a negative performance impact. Accordingly, it is generally desirable to somehow limit the impact of refresh on normal data traffic, so that performance is not unduly reduced. Many techniques have been used to improve the scheduling of refresh to minimize its performance impact.
While the aforementioned techniques have indeed improved the performance of memory systems employing them, there nevertheless remains a need for improved memory system performance. Additionally, it is desirable to increase the efficiency of memory system components so that maximum performance can be squeezed out of the fixed costs that they contribute to the memory system.