Various elements of data processing systems have understandably advanced at differing rates and in differing directions in recent years. Processors, specifically microprocessors, have become more powerful and much faster, being able to run at very high clock speeds. Memories, on the other hand, have not become significantly faster but have increased many fold their bit size and reduced their cost per bit. This generalization is particularly true for dynamic random access memories (DRAMs). So to enable these high density memories to be accessed at a speed more compatible with the speed which a microprocessor can call for, use, and send back data, many strategies have been proposed and developed. One such strategy has been to utilize a cache memory to store portions of data from the main memory systems. This strategy can well succeed when at least two conditions are met. The two conditions are that the memory used as cache memory has significantly faster access time than the main memory and that the portion of data stored in this cache memory has a high probability of being accessed by the microprocessor, in term of art, "hit". Implementation of these cache memory systems has been well developed in the art.
Static random access memory (SRAM) devices have been used for cache memory because of their fast access times relative to DRAM memories. A typical access time, for example, for a DRAM is 120 nanoseconds while access for an SRAM memory is typically 20 to 40 nanoseconds. However, SRAM device architecture requires presently a high chip space/bit ratio, hence, they are largely unsuitable for high density main storage devices. Also SRAM devices typically consume significantly higher power than comparable DRAM devices.
It has been proposed, however, that SRAM cache memories be located on DRAM memory arrays. This approach offers some solution to the speed problem encountered in accessing DRAM devices. The drawbacks of this approach are as follows: 1) In order to achieve a high probability of hits it has been believed that a relatively large cache had to be constructed. Because of the space needed for SRAM cells, building a reasonably sized cache on a DRAM chip took unacceptable space. 2) The logic and register support necessary to implement a cache memory system also occupies significant physical space on a chip. This additional space is probably not acceptable on a DRAM chip but to locate this off-chip would sacrifice the speed advantages of on-chip placement by requiring a bus interconnect and foregoing most parallel communication.
Goodman and Chiang, "The Use of Static Column RAM as a Memory Hierarchy," The 11th Annual Symposium on Computer Architecture, IEEE Computer Society Press, 1984, pp. 167-174, suggest the use of the sense-amplifying row or static row buffer of the modern static column decode DRAM device as a cache memory. This suggestion solves the problem of using space unacceptably on a chip for low density SRAM cache memory since the static row buffer is already present on the device. This approach suffers the serious drawback, however, that it offers only one row of cache memory, albeit the row contains a number of memory cells equal to the number of the main memory, DRAM, array columns. Therefore the "hit" probability is typically not acceptably high.
A further refinement of the Goodman and Chiang suggestion is to use "by 2" or "by 4" memory devices instead of "by 1" devices. In other words, for example, to get 1M bit capacity instead of using one DRAM array having 1024.times.1024 memory cells with one static row buffer, 1024 cells in length, in one device, use a device having four 256K bit arrays having 512.times.512 memory cells and each having a static row buffer 512 cells in length. This configuration allows four separately addressable "cache" rows by using the four static row buffers. However this solution has the drawbacks that such "by 4" devices are more costly and less available than "by 1" devices, the "by 4" devices are extremely difficult to error correct using standard error correcting codes and procedures, "by 4" devices require more I/O pins and therefore a larger physical package than "by 1" devices, "by 4" devices consume more power than "by 1" devices, they require more on-chip addressing logic, they require off-chip addressing or demultiplexing functions above that required by "by 1" arrays, and they could require more than twice the physical space to house the four static row buffers than the "by 1" devices.