1. Field of the Invention
The present invention is directed to a system for using a dirty bit in a cache memory.
2. Description of the Related Art
In a typical IBM PC/AT-compatible computer system, a host processing unit is coupled to a host bus and most I/O peripheral devices are coupled to a separate I/O bus. The host processing unit typically comprises an Intel i386, i486 or Pentium.TM. microprocessor, and the I/O bus typically conforms to a standard known as ISA (Industry Standard Architecture). I/O interface circuitry, which usually comprises one or more chips in a "core logic chip set", provides an interface between the two buses. A typical system also includes a memory subsystem, which usually comprises a large array of DRAMs and perhaps a cache memory.
General information on the various forms of IBM PC AT-compatible computers can be found in IBM, "Technical Reference, Personal Computer AT" (1985), in Sanchez, "IBM Microcomputers: A Programmer's Handbook" (McGraw-Hill: 1990), in MicroDesign Resources, "PC Chip Sets" (1992), and in Solari, "AT Bus Design" (San Diego: Annabooks, 1990). See also the various data books and data sheets published by Intel Corporation concerning the structure and use of the 80.times.86 family of microprocessors, including Intel Corp., "Pentium.TM. Processor", Preliminary Data Sheet (1993); Intel Corp., "Pentium.TM. Processor User's Manual" (1994); "i486 Microprocessor Hardware Reference Manual", published by Intel Corporation, copyright date 1990, "386 SX Microprocessor", data sheet, published by Intel Corporation (1990), and "386 DX Microprocessor", data sheet, published by Intel Corporation (1990). In addition, a typical core logic chipset includes the OPTi 82C802G and either the 82C601 or 82C602, all incorporated herein by reference. The 82C802G is described in OPTi, Inc., "OPTi PC/AT Single Chip 82C802G Data Book", Version 1.2a (Dec. 1, 1993), and the 82C601 and 82C602 are described in OPTi, Inc., "PC/AT Data Buffer Chips, Preliminary, 82C601/82C602 Data Book", Version 1.0e (Oct. 10, 1993). All the above references are incorporated herein by reference.
Many IBM PC AT-compatible computers today include one, and usually two, levels of cache memory. A cache memory is a high-speed memory that is positioned between a microprocessor and main memory in a computer system in order to improve system performance. Cache memories (or caches) store copies of portions of main memory data that are actively being used by the central processing unit (CPU) while a program is running. Since the access time of a cache can be faster than that of main memory, the overall access time can be reduced. Descriptions of various uses of and methods of employing caches appear in the following articles: Kaplan, "Cache-based Computer Systems," Computer, 3/73 at 30-36; Rhodes, "Caches Keep Main Memories From Slowing Down Fast CPUs," Electronic Design, Jan. 21, 1982, at 179; Strecker, "Cache Memories for PDP-11 Family Computers," in Bell, "Computer Engineering" (Digital Press), at 263-67, all incorporated herein by reference. See also the description at pp. 6-1 through 6-11 of the "i486 Processor Hardware Reference Manual" incorporated above.
Many microprocessor-based systems implement a "direct mapped" cache memory. In general, a direct mapped cache memory comprises a high speed data Random Access Memory (RAM) and a parallel high speed tag RAM. The RAM address of each line in the data cache is the same as the low-order portion of the main memory line address to which the entry corresponds, the high-order portion of the main memory address being stored in the tag RAM. Thus, if main memory is thought of as 2'" blocks of 2" "lines" of one or more bytes each, the i'th line in the cache data RAM will be a copy of the i'th line of one of the 2'" in blocks in main memory. The identity of the main memory block that the line came from is stored in the i'th location in the tag RAM.
When a CPU requests data from memory, the low-order portion of the line address is supplied as an address to both the cache data and cache tag RAMs. The tag for the selected cache entry is compared with the high-order portion of the CPU's address and, if it matches, then a "cache hit" is indicated and the data from the cache data RAM is enabled onto a data bus of the system. If the tag does not match the high-order portion of the CPU's address, or the tag data is invalid, then a "cache miss" is indicated and the data is fetched from main memory. It is also placed in the cache for potential future use, overwriting the previous entry. Typically, an entire line is read from main memory and placed in the cache on a cache miss, even if only a byte is requested. On a data write from the CPU, either the cache or main memory or both may be updated.
Accordingly, in a direct mapped cache, each "line" of secondary memory can be mapped to one and only one line in the cache. In a "fully associative" cache, a particular line of secondary memory may be mapped to any of the lines in the cache; in this case, in a cacheable access, all of the tags must be compared to the address in order to determine whether a cache hit or miss has occurred. "K-way set associative" cache architectures also exist which represent a compromise between direct mapped caches and fully associative caches. In a k-way set associative cache architecture, each line of secondary memory may be mapped to any of k lines in the cache. In this case, k tags must be compared to the address during a cacheable secondary memory access in order to determine whether a cache hit or miss has occurred. Caches may also be "sector buffered" or "sub-block" type caches, in which several cache data lines, each with its own valid bit, correspond to a single cache tag RAM entry.
When the CPU executes instructions that modify the contents of the cache, these modifications must also be made in the main memory or the data in main memory will become "stale." There are two conventional techniques for keeping the contents of the main memory consistent with that of the cache--(1) the write-through method and (2) the write-back or copy-back method. In the write-through method, on a cache write hit, data is written to the main memory immediately after or while data is written into the cache. This enables the contents of the main memory always to be valid and consistent with that of the cache. In the write-back method, on a cache write hit, the system writes data into the cache and sets a "dirty bit" which indicates that the data has been written into the cache but not into the main memory. A cache controller checks for a dirty bit before overwriting any line of data in the cache, and if set, writes the line of data out to main memory before loading the cache with new data.
A computer system can have more than one level of cache memory for a given address space. For example, in a two-level cache system, the "level one" (L1) cache is logically adjacent to the host processor. The second level (L2) cache is logically behind the L1 cache, and main memory (e.g. DRAM) is located logically behind the second level cache. When the host processor performs an access to an address in the memory address space, the L1 cache responds if possible. If the L1 cache cannot respond (for example, because of an L1 cache miss), then the L2 cache responds if possible. If the L2 cache also cannot respond, then the access is made to main memory. The host processor does not need to know how many levels of caching are present in the system or indeed that any caching exists at all. Similarly, the L1 cache does not need to know whether a second level of caching exists prior to main memory. Thus, to the host processing unit, the combination of both caches and main memory is considered merely as a single main memory structure. Similarly, to the L1 cache, the combination of the L2 cache and main memory is considered simply as a single main memory structure. In fact, a third level of caching could be included between the L2 cache and the main memory, and the L2 cache would still consider the combination of L3 and main memory as a single main memory structure.
As the .times.86 family of microprocessors has advanced, additional functions have been included on the microprocessor chip itself. For example, while i386-compatible microprocessors did not include any cache memory on-chip, the i486-compatible microprocessors did. Specifically, these microprocessors included a level one, "write-through" cache memory.
Pentium-compatible microprocessors also include a level one cache on-chip. This cache is divided into a data cache and a separate code cache. Unlike the cache included on the i486-compatible microprocessor chips, the data cache on a Pentium chip follows a write-back policy. The cache is actually programmable on a line-by-line basis to follow a write-through or a write-back policy, but special precautions must be taken externally to the chip as long as even one line is to follow a write-back policy. The data cache on a Pentium chip implements a "modified/exclusive/shared/invalid" (MESI) write-back cache consistency protocol, whereas the code cache only supports the "shared" and "invalid" states of the MESI protocol. The MESI protocol is described in "Intel, "Pentium Processor User's Manual, Vol. 1: Pentium Processor Databook" (1993), incorporated herein by reference.
In Pentium-based systems, an L2 cache typically is part of a supporting chip set. One drawback of current L2 cache designs is that they include a performance penalty for setting the dirty bit. For example, one L2 cache that can be used is a write back cache which includes a tag store for storing L2 tags and a dirty bit store for storing L2 dirty bits. Typically, the tag store and dirty bit store are SRAMs. To save resources (cost, space, power, heat, silicon etc.), the tag store and dirty bit store can be combined onto one SRAM. For example, a 32k.times.9 SRAM can be used to store eight bit tags and a dirty bit for each line in the L2 cache, or a 32k.times.8 SRAM can be used to store a 7 bit tag and a dirty bit. However, such a configuration can slow down the system during some memory cycles. A 32k.times.9 SRAM will typically include one set of bidirectional pins for both reading and writing the tags and dirty bit. A default state is likely to be output. When writing dirty bits it will be necessary to use one half of a clock cycle to assert a write enable which would change the dirty bit pin to an input. A second clock cycle is needed to actually write to the dirty bit. Thus, using a bidirectional pin to write to and read from the dirty bit store can cause a performance penalty. In some designs, setting the dirty bit can cause a performance penalty even if the dirty bit store is separate from the tag store and/or has separate data input and output pins.
Since computers are judge by their price and performance, computer makers strive to increase the speed and efficiency of their computers without increasing costs. Thus, there is a need for a system of writing to a cache memory that reduces the performance penalty for setting a dirty bit.