This invention relates to cache memory systems for use in a parallel-pipelined computer system and in particular to such a cache memory having an efficient interface with main memory.
A conventional computer system includes a central processing unit (CPU) a memory subsystem and an input/output (I/O) subsystem. In some computer systems, the principle connection among these three entities is a single bidirectional bus. In other systems, the CPU and memory are connected by one bus while the I/O subsystem is connected to the memory by another bus. It is well known that, in either configuration, the performance of the system is at least partly dependent on the speed at which data can be transferred between the CPU and the memory.
In general terms, this problem has been addressed by increasing the "bandwidth" of the bus. As used herein, the bandwidth of a bus is a measure of the amount of data which can be transferred over the bus in a unit time interval.
There are, however, many ways in which the apparent bandwidth of the memory bus may be increased. One method is to provide a wider bus, that is to say one which conveys a greater number of bits in parallel between the CPU and the memory. Another method is to provide faster memory. This may be done either by using higher performance memory components or by partitioning the memory so that multiple memory fetch operations may proceed in parallel.
Interleaving is a particular type of memory partitioning. In an interleaved memory subsystem, memory cells having successive addresses are assigned to respectively different partitions. This partitioning allows the fetch operations for consecutively addressed words to be overlapped. In a four-way interleaved memory subsystem, for example, data for any four memory cells having consecutive addresses may be fetched or stored in parallel. Thus, the apparent memory cycle time of an interleaved memory is a fraction of the memory cycle time of the actual memory devices as long as a significant portion of memory operations access consecutive memory cells.
Partitioning does not speed up all memory accesses, however, since successive memory access requests which map onto the same memory partition will occur at the memory cycle time of the actual memory devices.
Another way in which the bandwidth of data transfers between the CPU and memory may be increased is to use high-performance devices in the memory subsystem. This method tends to be of limited utility since both the devices themselves and the hardware used to support them tend to increase the cost of the computer system without producing a proportional increase in performance.
High-performance memory devices of this type, however, are widely used in cache memory systems which interface between the CPU and the main memory subsystem. A cache memory attempts to take advantage of any tendencies toward temporal or spatial locality of reference in computer programs by fetching data from cells surrounding each memory cell accessed by the program. Data in these surrounding cells is held in a relatively small, high-speed memory local to the CPU until it is overwritten by data from other memory access requests. While this data is in the cache, it may be accessed by the CPU-without substantial delay.
While cache memories may tend to increase system performance for programs that frequently access data in relatively small data sets, they present problems when large volumes of data are accessed by the program controlling the CPU. For these programs, data values in the cache memory are continually being replaced to satisfy the requests issued by the CPU. Due to the operation of the cache, each of these requests produces multiple memory access requests which may delay subsequent requests. In addition, when new data to be added to the cache must replace some of the existing data, a data replacement algorithm is invoked to determine which of the existing data entries in the cache is to be replaced by the new data. This replacement algorithm may significantly delay memory accesses by the CPU or it may entail the use of relatively costly dedicated electronic devices.