This invention relates to memory systems for computers, and more particularly to a method for enhancing performance of a memory stream buffer.
Referring to FIG. 1, a typical memory system contains three main parts:
Instructions from a CPU 16 travel along the bus to the memory controller 14. The memory controller 14 in turn supplies the DRAMs with address data so that the desired information can be retrieved or stored.
The information transferred between the CPU and DRAMs must conform with certain timing relationships between the request signal and the information on the bus. Retrieving or storing data in the DRAMs, however, can take a number of "cycles". A cycle is the time interval between the instant at which the DRAMs receive a request from the CPU and the instant the information is available for use by the CPU. The number of cycles required can affect the system timing and thus system performance.
One objective of computer design is to provide the CPU with information at the fastest possible rate. To increase processing speed, many computer systems now employ what is called a "cache" memory. A cache is a high speed memory which holds a subset of data from the main memory that is used to decrease the need to access the DRAMs for each CPU command. When the CPU issues a command, the cache is checked first to see if it contains the requested information. If the cache contains the requested data (a "hit"), then that data is sent to the CPU. If that data is not in the cache (a "miss"), then that data must be retrieved from the DRAMs.
As the speed of processors increases, the latency time for access to the DRAMS has become a major problem. For example, a high speed reduced instruction set computer (RISC) of the type disclosed in pending application Ser. No. 547,630, filed Jun. 29, 1990, assigned to Digital Equipment Corporation, may be constructed to operate at a CPU cycle time of 5 nano seconds or less, and execute an instruction during each cycle. If the main memory (usually composed of DRAMs) has a cycle time of 300 nano seconds, for example, it can be calculated that the CPU could spend much of its time waiting for memory even when the system uses a cache. In efforts to bring the memory performance on par with the CPU, the cache memory can be made hierarchial, providing primary, secondary, and in some cases third level caches, and the speed of the cache memory can be increased as much as is economical. In addition, the bandwidth of the memory bus can be increased by using a wider data path. Nonetheless, there is still a need to reduce the amount of time the CPU spends waiting on memory, to achieve acceptable performance of these high speed CPUs.
The cache memory operates in accordance with the "principle of locality"; that is, if a memory location is addressed by the CPU, it and nearby memory locations will probably be addressed again soon. The principle of locality suggests that cache lines will often be accessed in sequence. When two sequential cache lines are accessed there is a reasonable probability that sequential accesses will continue. The frequency of sequential read operations lends itself to creation of a buffering system to detect such sequences and use them to prefetch additional data.
"Stream buffers" can be used to access data more quickly in the case of sequential read requests. A stream buffer holds read data prefetched from addresses following a sequential read access from the CPU. Placing stream buffers in the memory controller provides a faster access to sequential data located on memory modules installed on a multi-node memory interconnect or "bus." By taking advantage of the "fast page mode" capabilities of the DRAMs sequential memory accesses are detected. In response to these sequential address requests, memory data from the next sequential location is prefetched in advance of the actual request for that data by the hosting computer. That data is placed in a high speed memory device. As a result, when the host computing system requests data from the next sequential location, the data can be delivered to the host computing system much faster than if the data had to be delivered directly from the DRAMs on the memory modules, thus increasing the processing speed.
The stream buffers are located on the memory module itself, rather than in the CPU. The stream buffer memory can be placed on the memory modules so the buffers can be filled without using the system bus (which is shared with other resources), thereby conserving system memory interconnect bandwidth and throughput. Also a significant performance advantage can be realized by filling the stream buffers using the fast page mode operation of the DRAM devices. By placing the stream buffer memory within the logic domain covered by the memory module error detection and correction logic, the reliability, availability, and data integrity is enhanced.
However, stream buffers sometimes are allocated unnecessarily to transactions which do not benefit from the reduced read latency of stream buffering. Frequently a sequential address stream is detected causing a buffer to be allocated and filled with prefetch data that is never used. This unnecessary allocation of stream buffers reduces the memory's availability to other bus transactions. Therefore, the stream detection logic can contain buffer enable and invalidation circuits which prevent the unnecessary allocation of stream buffers and thus reduce latency on read streams and increase system performance.