The present invention is generally related to cache memory for central processing units (CPUs) and, more particularly, to the use of cache memory in small computing systems without the significant performance difficulties resulting from the very limited bandwidth normally available to reload the cache from main memory on a cache miss.
The use of a cache for performance improvements on large computing systems is well known and extensively used. In a computing systems, large and small, which make use of cache, one of the major limitations on system performance is the time required to load a block into cache from main memory on a cache miss. Large systems attempt to minimize this time by the use of very large, but otherwise ordinary, bus width between main memory and cache and complex "Latin Square" mapping for a "late select" cache to allow multiple writing into the cache array. This improves the width of the data path. The speed is improved by interleaving of several (e.g. four to sixteen) memory modules.
Small systems usually cannot afford such large busses, complex mapping, or large number of memory modules, and thus must seek other solutions. The major, fundamental requirement on the cache is to provide a moderately small (32 to 64 bit) but very fast (at CPU cycle time) data path between cache and the processor and a very wide (block size) but only moderately fast (at main memory access time) data path between cache and main memory. When cache is a separate physical entity constructed from only few high speed array chips, the bus width between cache and main memory is severely limited by the small signal pin count available on the cache chips. For instance, using a 64K-bit static FET-RAM chip, two such chips give a 16K-byte cache, and the 128K-bit per chip version gives a 32K-byte cache on two chips. If a block size of 64 bytes (512 bits) is used, a data path of 512 lines would be desirable between main memory and cache, or for the least restrictive organization, 256 data pins per chip. It is currently not possible to provide this number of pins, a number like 16 or 32 being more likely. Because of this, the cache reload time is becoming a significant problem in many small, high performance systems.
In U.S. Pat. No. 4,382,278 issued to Daren R. Appelt, a computer system is proposed in which a plurality of registers and at least one workspace is provided in main memory. In addition, there is a workspace cache memory made up of registers within the CPU. Those registers correspond to the registers in the workspace in the main memory. Computer operations are implemented using the contents of the workspace cache registers whose contents are transmitted to the corresponding registers in the workspace of the main memory in the event of a context switch. The architecture of this workspace system achieves high speed register-to-register operations and high speed switching.
A high density memory system comprising a plurality of memory array boards, each having two memory chip arrays, is disclosed by William P. Ward in U.S. Pat. No. 4,183,095. The chip arrays are comprised of 576 memory elements, such as charge coupled devices (CCDs) or other such devices, with each of the memory elements having a 256K bit storage density. A function driver and buffer circuit is operatively associated with a corresponding memory chip array. Ward does not however, teach the use of the buffer circuits as a distributed associative cache.
The present invention is advantageously used with dynamic random access memory (DRAM) chips, and U.S. Pat. No. 3,969,706 issued to Robert J. Proebsting and Robert S. Green discloses an example of a DRAM chip. More specifically, this patent discloses a MISFET dynamic RAM chip wherein information from an address row are read and transferred to a column register. One bit in the column register is then selected by the column address decoder so that data is transferred from that bit to a data output latch. Upon completion of the row address strobe cycle, each cell in the address row is automatically refreshed by the data in the respective bit positions of the column register.