1. Field of the Invention
The present invention is directed to a method and apparatus for using a cache memory in a computer system.
1. Description of the Related Art
Many computers today include one, and usually two, levels of cache memory. A cache memory is a high-speed memory that is positioned between a central processing unit (CPU) and main memory in a computer system in order to improve system performance. Cache memories (or caches) store copies of portions of main memory data that are actively being used by the CPU while a program is running. Since the access time of a cache can be faster than that of main memory, the overall access time can be reduced.
Many microprocessor-based systems implement a "direct mapped" cache memory. In general, a direct mapped cache memory comprises a high speed data Random Access Memory (RAM) and a parallel high speed tag RAM. The RAM address of each line in the data cache is the same as the low-order portion of the main memory line address to which the entry corresponds. The high-order portion of the main memory address is stored in the tag RAM. Thus, if main memory is thought of as 2.sup.m blocks of 2.sup.n lines of one or more bytes each, the i'th line in the cache data RAM will be a copy of the i'th line of one of the 2.sup.m blocks in main memory. The identity of the main memory block that the line came from is stored in the i'th location in the tag RAM.
When a CPU requests data from memory, the low-order portion of the line address is supplied as an address to both the cache data and cache tag RAMs. The tag for the selected cache entry is compared with the high-order portion of the CPU's address and, if it matches, then a "cache hit" is indicated and the data from the cache data RAM is enabled onto a data bus of the system. If the tag does not match the high-order portion of the CPU's address, or the tag data is invalid, then a "cache miss" is indicated and the data is fetched from main memory. It is also placed in the cache for potential future use, overwriting the previous entry. Typically (but not required in all systems), an entire line is read from main memory and placed in the cache on a cache miss, even if only a byte is requested. On a data write from the CPU, either the cache or main memory or both may be updated.
Accordingly, in a direct mapped cache, each "line" of secondary memory can be mapped to one and only one line in the cache. In a "fully associative" cache, a particular line of secondary memory may be mapped to any of the lines in the cache. In the fully associative cache, all of the tags may be compared to the address in order to determine whether a cache hit or miss has occurred. "K-way set associative" cache architectures also exist which represent a compromise between direct mapped caches and fully associative caches. In a k-way set associative cache architecture, each line of secondary memory may be mapped to any of k lines in the cache. In this case, k tags must be compared to the address during a cacheable secondary memory access in order to determine whether a cache hit or miss has occurred. Caches may also be "sector buffered" or "sub-block" type caches, in which several cache data lines, each with its own valid bit, correspond to a single cache tag RAM entry.
When the CPU executes instructions that modify the contents of the cache, these modifications must also be made in the main memory or the data in main memory will become "stale." There are two conventional techniques for keeping the contents of the main memory consistent with that of the cache--(1) the write-through method and (2) the write-back or copy-back method. In the write-through method, on a cache write hit, data is written to the main memory immediately after or while data is written into the cache. This enables the contents of the main memory always to be valid and consistent with that of the cache. In the write-back method, on a cache write hit, the system writes data into the cache and sets a "dirty bit" which indicates that the data has been written into the cache but not into the main memory. A cache controller checks for a dirty bit before overwriting a line of data in the cache, and if set, writes the line of cache data out to main memory before loading the cache with new data.
A computer system can have more than one level of cache memory for a given address space. For example, in a two-level cache system, the level one (L1) cache is logically adjacent to the host processor. The level two (L2) cache is logically behind the L1 cache. Main memory (e.g. DRAM) is located logically behind the L2 cache. When the CPU performs an access to an address in the memory address space, the L1 cache responds if possible. If the L1 cache cannot respond (for example, because of an L1 cache miss), then the L2 cache responds if possible. If the L2 cache also cannot respond, then the access is made to main memory. The CPU does not need to know how many levels of caching are present in the system or indeed that any caching exists at all. Similarly, the L1 cache does not need to know whether a second level of caching exists. Thus, to the CPU unit, the combination of both caches and main memory is considered merely as a single main memory structure. Similarly, to the L1 cache, the combination of the L2 cache and main memory is considered simply as a single main memory structure. In fact, a third level of caching could be included between the L2 cache and the main memory, and the L2 cache may still consider the combination of L3 and main memory as a single main memory structure.
As microprocessor technology has advanced, level one caches have been included on the microprocessor chip itself. In systems employing a microprocessor with an in-chip L1 cache, a L2 cache is typically part of a supporting chip set. One drawback of some L2 cache designs is that they include, a performance penalty for setting the dirty bit. For example, one L2 cache that can be used as a write back cache includes a tag store for storing L2 tags and a dirty bit store for storing L2 dirty bits. Typically, the tag store and dirty bit store are SRAMs. Some systems use separate SRAMs for the dirty bit store and the tage store. However, to save resources (cost, space, power, heat, silicon etc.), the tag store and dirty bit store can be combined in one SRAM. For example, a 32 k.times.9 SRAM can be used to store eight bit tags and a dirty bit for each line in the L2 cache, or a 32 k.times.8 SRAM can be used to store a 7 bit tag and a dirty bit.
Many SRAMs include one set of bidirectional pins for both reading and writing data (rather than separate I/O pins) in order to reduce costs. The default state is likely to be output. When writing dirty bits to a dirty bit store implemented with an SRAM having bidirectional data pins, extra time is needed to assert a write enable in order to change the dirty bit pin to an input pin and set up for the write operation. Thus, the bus cycle can be expanded by two clocks or more to accommodate for the writing of dirty bits. In some designs, setting the dirty bit causes a performance penalty even if the dirty bit store has separate data input and output pins. For example, the time needed to write the new dirty bit can be a performance penalty.
Since computers are judged by their price and performance, computer makers strive to increase the speed and efficiency of their computers without significantly increasing costs. Thus, there is a need for a system of writing to a cache memory that reduces the performance penalty for setting a dirty bit.