FIG. 1 is a block diagram of a multi-processing system 100 which employs a shared memory architecture. System 100 includes processors 101a-101c, dedicated cache memories 102a-102c, dedicated cache controllers 103a-103c, memory bus 104, global main memory 105 and memory controller 106. Processors 101a-101c share main memory 105 through common parallel memory bus 104. Cache memories 102a-102c are typically constructed using relatively high speed SRAM arrays. Main memory 105 is typically constructed using relatively low speed and low cost DRAM arrays. Systems such as system 100 are described in the following references: (1) "Protocols Keep Data Consistent", John Gallant, EDN Mar. 14, 1991, pp.41-50 and (2) "High-Speed Memory Systems", A. V. Pohm and O. P. Agrawal, Reston Publishing, 1983, pp.79-83. Dedicated cache memories 102a-102c reduce the frequency with which each of processors 101a-101c access main memory 105. This reduces the amount of traffic on memory bus 104 and thereby enhances the performance of system 100. However, cache memories 102a-102c are relatively expensive. In system 100, an expensive cache memory must be added for each added processor. In addition, system 100 requires control logic to maintain the consistency of data in cache memories 102a-102c and main memory 105 (i.e., cache coherence). The problem of cache coherence is described in more detail in "Scalable Shared Memory Multiprocessors", M. Dubois and S. S. Thakkar, Kluwer Academic Publishers, 1992, pp.153-166. The control logic required to provide cache coherence increases the cost and decreases the performance of system 100.
FIG. 2 is a block diagram of another conventional multi-processor system 200 which includes a global main memory 204 which is divided into modules 206a-206c. Each of main memory modules 206a-206c is attached to a corresponding cache memory module 205a-205c, respectively. Each of cache memory modules 205a-205c is attached to a main memory bus 202. Processors 201a-201c are also attached to main bus 202. Processors 201a-201c share cache memory modules 205a-205c and main memory modules 206a-206c. System 200 is described in, "High-Speed Memory Systems", Pohm et al., pp.75-79.
In system 200, the number of cache and main memory modules can be different from the number of processors. Since both main memory modules 206s-206c and cache memory modules 205a-205c are global, system 200 is inherently coherent. However, the globalization of cache memory modules 205a-205c requires that the control logic for cache memory modules 205a-205c be common to all of processors 201a-201c. Consequently, in systems where the number of processors approximately equals to the number of cache entries in the cache module, cache thrashing can occur. Cache thrashing refers to the constant replacement of cache lines. Cache thrashing substantially degrades system performance.
To minimize the cost of SRAM cache memories, some prior art systems use additional prefetch buffers for instructions and data. These prefetch buffers increase the cache-hit rate without requiring large cache memories. Such prefetch buffers are described in PCT Patent Application PCT/US93/01814 (WO 93/18459), entitled "Prefetching Into a Cache to Minimize Main Memory Access Time and Cache Size in a Computer System" by Karnamadakala Krishnamohan et al. The prefetch buffers are used in a traditional separate cache memory configuration, and memory bandwidth is consumed by both the prefetch operations and the caching operations. A robust prefetch algorithm (with a consistently high probability of prefetching the correct information) and an adequate cache size and organization (to provide a high cache hit rate) is required to deliver any performance improvement over traditional caching schemes.
Other conventional systems use the sense-amplifiers of a DRAM array as a cache memory. (See, e.g., PCT Patent Publication PCT/US91/02590, by M. Farmwald et al.) Using the sense-amplifiers of a DRAM array as cache memory has the advantages of low cost and high transfer bandwidth between the main memory and the cache memory. The cache hit access time, equal to the time required to perform a CAS (column access) operation, is relatively short. However, the cache miss access time of such a system is substantially longer than the normal memory access time of the DRAM array (without using the sense amplifiers as a cache memory) This is because when the sense amplifiers are used as cache memory, the DRAM array is kept in the page mode (or activated mode) even when the DA array is not being accessed. A cache miss therefore requires that the DRAM array perform a precharge operation followed by RAS (row access) and CAS (column access) operations. The time required to perform the precharge operation (i.e., the precharge time) is approximately twice as long as the time required to perform the W operation. The total memory access time is equal to the sum of the precharge time, the RAS access time and the CAS access time of the DRAM array. In contrast, during normal operation of the DRAM array, the DRAM array is in precharged mode when it is not being accessed, and the memory access time in equal to the RAS access time plus the CAS access time of the DRAM array.
Mathematically, the average access time (Tav) in a simplified model of the sense amplifier cache system is the weighted average of cache-hit access time (Tcas) and the cache-miss access time (Tpre+Tras+Tcas). Thus, for cached accesses, EQU Tav-H*Tcas+(1-H)*(Tpre+Tras+Tcas) (1)
where Tav is the average access time of the DRAM array, H is the average cache hit rate (i.e., the probability of a cache hit), Tcas is the column access time from the sense amplifiers, Tpre is the precharge time of the sense amplifiers, and Tras is the time required to transfer data from a row of DRAM cells to the sense amplifiers.
For a DRAM array which does not use the sense amplifiers as a cache memory, EQU Tav=Tras+Tcas (2)
For most available DRAM, Tras-0.5 * Tpre. Consequently, the average cache hit rate (H) must be greater than 67 percent for the average access time of the cached system to be less than the average access time of the non-cached system. An even higher cache hit rate (H) is necessary to justify the complexity and cost introduced by the control logic required for the sense amplifier cache memory. Such a high average cache hit rate (H) is difficult to achieve and maintain in a multi-processor system due to frequent task switching. Such a high average cache hit rate is also difficult to achieve because the cache memory is inherently direct-mapped based on the direct coupling between the sense amplifiers and the memory cells of the DRAM array. For applications such as computer graphics, in which the amount of memory used is limited and there are several processors or processes, the cache hit rate can be quite low. In such applications, the short cache hit access time can be more than offset by the longer cache miss access time.
It is therefore desirable to have a cache memory for a multi-processor system which eliminates the dedicated cache memories of system 100, minimizes the cache thrashing problems of system 200, and minimizes the average memory access time.