1. Field of the Invention
The present invention relates to microprocessor cache subsystems in computer systems, and more specifically to a method and apparatus for decreasing the snooping requirements and reducing latency problems in a cache system.
2. Description of the Related Art
The driving force behind computer system innovation has been the demand for faster and more powerful personal computers. A major bottleneck in personal computer speed has historically been the speed with which data can be accessed from memory, referred to as the memory access time. The microprocessor, with its relatively fast processor cycle times, has generally been delayed by the use of wait states during memory accesses to account for the relatively slow memory access times. Therefore, improvement in memory access times has been one of the major areas of research in enhancing computer performance.
In order to bridge the gap between fast processor cycle times and slow memory access times, cache memory was developed. A cache is a small amount of very fast, and expensive, zero wait state memory that is used to store a copy of frequently accessed code and data from main memory. The microprocessor can operate out of this very fast memory and thereby reduce the number of wait states that must be interposed during memory accesses. When the processor requests data from memory and the data resides in the cache, then a cache read hit takes place, and the data from the memory access can be returned to the processor from the cache without incurring wait states. If the data is not in the cache, then a cache read miss takes place. In a cache read miss, the memory request is forwarded to the system, and the data is retrieved from main memory, as would normally be done if the cache did not exist. On a cache miss, the data that is retrieved from memory is provided to the processor and is also written into the cache due to the statistical likelihood that this data will be requested again by the processor.
An efficient cache yields a high "hit rate", which is the percentage of cache hits that occur during all memory accesses. When a cache has a high hit rate, the majority of memory accesses are serviced with zero wait states. The net effect of a high cache hit rate is that the wait states incurred on a relatively infrequent miss are averaged over a large number of zero wait state cache hit accesses, resulting in an average of nearly zero wait states per access. Also, since a cache is usually located on the local bus of the microprocessor, cache hits are serviced locally without requiring use of the system bus. Therefore, a processor operating out of its local cache has a much lower "bus utilization." This reduces system bus bandwidth used by the processor, making more bandwidth available for other devices, such as intelligent bus masters, which can independently gain access to the bus.
Another important feature of caches is that the processor can operate out of its local cache when it does not have control of the system bus, thereby increasing the efficiency of the computer system. In systems without microprocessor caches, the processor generally must remain idle while it does not have control of the system bus. This reduces the overall efficiency of the computer system because the processor cannot do any useful work at this time. However, if the processor includes a cache placed on its local bus, it can retrieve the necessary code and data from its cache to perform useful work while other devices have control of the system bus, thereby increasing system efficiency.
Important considerations in cache performance are the organization of the cache and the cache management policies that are employed in the cache. A cache can generally be organized into either a direct-mapped or set-associative configuration. In a direct-mapped organization, the physical address space of the computer is conceptually divided up into a number of equal pages, with the page size equaling the size of the cache. The cache is partitioned into a number of sets, with each set having a certain number of lines. The line size is generally a plurality of dwords, wherein a dword is 32 bits. Each of the conceptual pages in main memory has a number of lines equivalent to the number of lines in the cache, and each line from a respective page in main memory corresponds to a similarly located line in the cache. An important characteristic of a direct-mapped cache is that each memory line from a conceptual page in main memory, referred to as a page offset, can only reside in the equivalently located line or page offset in the cache. Due to this restriction, the cache only need refer to a certain number of the upper address bits of a memory address, referred to as a tag, to determine if a copy of the data from the respective memory address resides in the cache because the lower order address bits are pre-determined by the page offset of the memory address.
Whereas a direct-mapped cache is organized as one bank of memory that is equivalent in size to a conceptual page in main memory, a set-associative cache includes a number of banks, or ways, of memory that are each equivalent in size to a conceptual page in main memory. Accordingly, a page offset in main memory can be mapped to a number of locations in the cache equal to the number of ways in the cache. For example, in a 4-way set associative cache, a line or page offset from main memory can reside in the equivalent page offset location in any of the four ways of the cache. As with a direct-mapped cache, each of the ways in a multiple way cache is partitioned into a number of sets each having a certain number of lines. A set-associative cache also generally includes a replacement algorithm, such as a least recently used (LRU) algorithm, that determines which bank, or way, with which to fill data when a read miss occurs.
Cache management is generally performed by a device referred to as a cache controller. One cache management duty performed by the cache controller is the handling of processor writes to memory. The manner in which write operations are handled determines whether a cache is designated as "write-through" or "write-back." When the processor initiates a write to main memory, the cache is first checked to determine if a copy of the data from this location resides in the cache. If a processor write hit occurs in a write-back cache design, then the cache location is updated with the new data, and main memory is only updated later if this data is requested by another device, such as a bus master. Alternatively, the cache maintains the correct or "clean" copy of data thereafter, and the main memory is only updated when a flush operation occurs. In a write-through cache, the main memory location is generally updated in conjunction with the cache location on a processor write hit. If a processor write miss occurs to a write-through cache, the cache controller may either ignore the write miss or may perform a "write-allocate," whereby the cache controller allocates a new line in the cache in addition to passing the data to the main memory. In a write-back cache design, the cache controller generally allocates a new line in the cache when a processor write miss occurs. This generally involves reading the remaining entries from main memory to fill the line in addition to allocating the new write data.
The cache controller includes a directory that holds an associated entry for each set in the cache. In a write-through cache, this entry generally has three components: a tag, a tag valid bit, and a number of line valid bits equaling the number of lines in each cache set. The tag acts as a main memory page number, and it holds the upper address bits of the particular page in main memory from which the copy of data residing in the respective set of the cache originated. The status of the tag valid bit determines whether the data in the respective set of the cache is considered valid or invalid. If the tag valid bit is clear, then the entire set is considered invalid. If the tag valid bit is true, then an individual line within the set is considered valid or invalid depending on the status of its respective line valid bit. In a write-back cache, the entries in the cache directory are generally comprised of a tag and a number of tag state bits for each of the lines in each set. As before, the tag comprises the upper address bits of the particular page in main memory from which the copy originated. The tag state bits determine the status of the data for each respective line, i.e., whether the data is invalid, modified (owned), or clean.
A principal cache management policy is the preservation of cache coherency. Cache coherency refers to the requirement that any copy of data in a cache must be identical to (or actually be) the owner of that location's data. The owner of a location's data is generally defined as the respective location having the most recent or the correct version of data. The owner of data is generally either an unmodified location in main memory, or a modified location in a write-back cache.
In computer systems where independent bus masters can access memory, there is a possibility that a bus master, such as a direct memory access controller, network or disk interface card, or video graphics card, might alter the contents of a main memory location that is duplicated in the cache. When this occurs, the cache is said to hold "stale," "dirty" or invalid data. Also, when the processor executes a cache write hit operation to a write-back cache, the cache receives the new data, but main memory is not updated until a later time, if at all. In this instance, the cache contains a "clean" or correct version of the data and is said to own the location, and main memory holds invalid or "dirty" data. Problems would arise if the processor was allowed to access dirty data from the cache, or if a bus master was allowed to access dirty data from main memory. Therefore, in order to maintain cache coherency, i.e., in order to prevent a device such as a processor or bus master from inadvertently receiving incorrect or dirty data, it is necessary for the cache controller to monitor the system bus for bus master accesses to main memory when the processor does not control the system bus. This method of monitoring the bus is referred to as snooping.
In a write-back cache design, the cache controller must monitor the system bus during memory reads by a bus master because of the possibility that the cache may own the location, i.e., the cache may contain the only correct copy of data for this location, referred to as modified data. This is referred to as read snooping. On a read snoop hit where the cache contains modified data, the cache controller generally provides the respective data to main memory, and the requesting bus master generally reads this data en route from the cache controller to main memory, this operation being referred to as snarfing. Alternatively, the cache controller provides the respective data directly to the bus master and not to main memory. In this alternative scheme, the main memory would perpetually contain erroneous or "dirty" data until a cache flush occurred.
In both write-back and write-through cache designs, the cache controller must also monitor the system bus during bus master writes to memory because the bus master may write to or alter a memory location having data that resides in the cache. This is referred to as write snooping. On a write snoop hit to a write-through cache, the cache entry is generally marked invalid in the cache directory by the cache controller, signifying that this entry is no longer correct. In a write-back cache, the cache is updated along with main memory, and the tag states bits are set to indicate that the respective cache location now includes a clean copy of the data. Alternatively, a write-back cache may invalidate the entire line on a snoop write hit. Therefore, in a write-back cache design, the cache controller must snoop both bus master reads and writes to main memory. In a write-through cache design, the cache controller need only snoop bus master writes to main memory.
The process of snooping generally entails that the cache controller latch the system bus address and perform a cache look-up in the tag directory corresponding to the page offset location where the memory access occurred to see if a copy of data from the main memory location being accessed also resides in the cache. If a copy of the data from this location does reside in the cache, then the cache controller takes the appropriate action depending on whether a write-back or write-through cache design has been implemented, or whether a read or write snoop hit has occurred. This prevents incompatible data from being stored in main memory and the cache, thereby preserving cache coherency.
However, the requirement that a cache snoop every non-processor memory access, or every memory write in a write-through cache, considerably impairs the efficiency of the processor working out of its cache during this time because it is continually being interrupted by snoop accesses. This snooping requirement degrades system performance because it prevents the processor from efficiently operating out of its cache while it does not have control of the system bus. Therefore, a method and apparatus is desired to reduce the snooping requirements of a cache so that the processor can more efficiently operate out of its cache when the processor does not have control of the bus.
Background on multilevel cache systems is deemed appropriate. Caches have generally been designed independently of the microprocessor. The cache is placed on the local bus of the microprocessor and interfaced between the processor and the system bus during the design of the computer system. However, with the development of higher transistor density computer chips, many processors are currently being designed with an on-chip cache in order to meet performance goals with regard to memory access times. The on-chip cache used in these processors is generally small, an exemplary size being 8 kbytes in size. The smaller, on-chip cache is generally faster than a large off-chip cache and reduces the gap between fast processor cycle times and the relatively slow access times of large caches.
In computer systems that utilize processors with on-chip caches, an external, second level cache is often added to the system to further improve memory access time. The second level cache is generally much larger than the on-chip cache, and, when used in conjunction with the on-chip cache, provides a greater overall hit rate than the on-chip cache would provide by itself.
In systems that incorporate multiple levels of caches, when the processor requests data from memory, the on-chip or first level cache is first checked to see if a copy of the data resides there. If so, then a first level cache hit occurs, and the first level cache provides the appropriate data to the processor. If a first level cache miss occurs, then the second level cache is then checked. If a second level cache hit occurs, then the data is provided from the second level cache to the processor. If a second level cache miss occurs, then the data is retrieved from main memory. Write operations are similar, with mixing and matching of the operations discussed above being possible.
In many instances where multilevel cache hierarchies exist with multiple processors, a property referred to as multilevel inclusion may be implemented in the hierarchy. Multilevel inclusion provides that the second level cache is guaranteed to have a copy of what is inside the first level, or on-chip cache. When this occurs, the second level cache is said to hold a superset of the first level cache. When multilevel inclusion is implemented, and certain criteria are met, it is possible for the second level cache to perform the snooping responsibilities for both caches. For more information on this feature, please see related application Ser. No. 07/538,874 filed Jun. 15, 1990 titled "Multilevel Inclusion in Multilevel Cache Hierarchies," which is hereby incorporated by reference. In multilevel cache systems where multilevel inclusion is not implemented, it is generally necessary for each cache to snoop the system bus during memory accesses by other bus masters in order to maintain cache coherency.