As computer circuit design clock speeds increase, the rate at which data in the main memory of a computer can be accessed becomes all-important in the final determination of system performance. In modem computers, cache memories, or “caches” are used to store a portion of the contents of main memory that are likely to be re-used. Caches are typically smaller and faster than main memory, and are used to hide the latencies involved in using the main memory for storing and retrieving memory operands. Typical cache access times are about five to thirty times faster than main memory access times, significantly increasing overall system performance. Thus, while cache memories are not limited to use with central processing units (CPUs), their primary application is to store memory operands required by one or more CPUs (as opposed to other users of data) for rapid recall, obviating the need to access the slower main memory.
There can be more than one cache used to speed up access to main memory in a computer system. In fact, it is well known in the art to provide multiple levels of caches. For example, a CPU may be provided with a level one (L1) cache on the same integrated circuit as the CPU, and a larger, slower level two (L2) cache in the same module as the CPU. Alternatively, the L2 cache may be provided as a completely separate set of memory circuitry, apart from the CPU module. The L2 cache is typically used to speed up access to the main computer memory (i.e., accesses to the main memory are “cached,” or stored, by the L2 cache), while the L1 cache is typically used to speed up access to the L2 cache (i.e., accesses to the L2 cache are cached by the L1 cache). In the discussion that follows, it will be assume loaded into a single cache from main memory. However, it should be understood that such operands may also be loaded from a lower level cache into a higher level cache, if appropriate.
Since cache memories are typically smaller than the main memories to which they are coupled, a strategy may be used to determine which contents of the main memory are to be stored in the cache. One of the simplest such cache organizations is the direct-mapped cache organization.
A cache is usually organized in “lines” or groups of bytes, and not as a single group of individual bytes. Thus, each cache line is used to store a small contiguous range of main memory contents, such as 32 or 64 bytes. In a direct-mapped cache, a portion of the main memory address is used as an index, and the remainder of the main memory address (not including any bits of the main memory address that represent bytes within a cache line) is used as a tag. The number of bits used for the index corresponds to the size of the cache. For example, a direct-mapped cache having 64 cache lines will have a corresponding six-bit index (i.e., since 26=64). When a read operation occurs and the memory operand is not in the cache (i.e., the tag does not match, or there is a “cache miss”), the memory operand is fetched from main memory and stored in the cache line corresponding to the index, and the tag is stored in a tag field associated with the cache line. Assuming the memory operand is still in the cache (i.e., the tags match, or there is a “cache hit”) the next time a read operation occurs, the memory operand will be retrieved directly from the cache.
Continuing to use the example of a direct-mapped cache, for any given byte in the main memory, there is only one cache line in which the byte can be stored. Therefore, if the cache line is already in use, the old contents of the cache line are simply overwritten with the new contents. If the old contents are the result of a previous memory operand write operation, and have not yet been copied back to main memory, the cache line is known in the art as a “dirty” cache line, and must be written back to main memory before the new contents can be stored therein. This replacement process is effected by what is known as a “write-back” cache. However, if the old contents in the cache line are identical to the contents in main memory (because they were written to main memory about the same time they were written to the cache line), the old contents may be overwritten (i.e., evicted) directly, without having to write back to main memory. This process, which is slower, but provides a more up-to-date picture of the true main memory content, is effected by what is known as a “write-through” cache.
Designers have a choice of memory types which can be used to construct on-chip cache memory circuitry. As opposed to Static Random Access Memory (SRAM), on-chip Dynamic Random Access Memory (DRAM) may be chosen to save valuable chip surface area for other functions, but requires periodic refresh activity, which complicates the design. In addition, if refresh operations do not occur in a timely fashion, the entire cache is invalidated, which degrades processor performance and increases the number of cache misses. While the solution of selective invalidation using a REFRESH bit for each cache entry has been offered in an attempt to circumvent the need for refreshing a DRAM cache, such designs have been limited to instruction-only (i.e., read-only) or write-through caches, ensuring that an up-to-date copy of the cache always resides outside of the cache. Excluding the use of write-back operations in conjunction with a DRAM cache allows entry invalidation and/or refresh operations to proceed without considering the need for complicated write-back activity. However, in applications where performance is paramount, a write-back cache may be highly desirable, especially if no L2 cache is available.
Thus, there is a need in the art for methods and apparatus which foster the extensive use of DRAM as a part of cache memory to conserve valuable circuit real-estate. Such methods and apparatus should also provide cache designers with the option of using a write-back cache whenever that function is needed or desired, without undue interference in processor data processing activity.