1. Field of the Invention
This invention relates to computer systems and, more specifically, to multiple level cache memories used in computer systems.
2. Background Information
Most computer systems include one or more central processing units (CPUs), one or more main memories, and one or more input/output (I/O) subsystems all of which may be interconnected by a system bus. The CPUs typically fetch instructions and data from the main memories and execute those instructions. Although the processing speeds of CPUs has increased dramatically over the years, the speed of memory systems has not increased at the same rate. As a result, CPUs are often forced to wait, sometimes for significant periods of time, to receive instructions and/or data from the memories. Such delays can significantly degrade the computer system""s performance. To reduce such delays, cache memories, which are often simply referred to as caches, were developed.
A cache is a small, fast memory module that is placed in close proximity to the CPU. Many caches are static random access memories (SRAMs), which are faster, but more expensive, than dynamic random access memories (DRAMs), which are often used for main memory. The cache is used to store frequently accessed instructions and/or data. That is, the cache contains an image or subset of the information from main memory. Instructions and/or data can be quickly accessed by a CPU from its respective cache, thereby reducing delays. When the CPU issues a request for instructions and/or data, a search is first made to see whether the requested instructions and/or data are resident in the CPU""s cache. If they are, a cache xe2x80x9chitxe2x80x9d is said to occur and the desired instructions and/or data are retrieved from the cache and provided to the CPU for processing. If the requested instructions and/or data are not found in the cache, a cache xe2x80x9cmissxe2x80x9d is said to occur. In this case, a request must be sent to main memory for the desired instructions and/or data. While this request is being processed, the CPU remains in a wait state. If the wait is expected to be especially long, some CPUs may begin executing a new process or thread.
The desired instructions and/or data are read out of main memory and sent to the CPU. When the instructions and/or data are received at the CPU, they are typically placed in the cache. To reduce the number of cache misses, computer architects have devised numerous schemes to try and anticipate what instructions and/or data a CPU is likely to request in the future. These xe2x80x9canticipatedxe2x80x9d instructions and/or data are prefetched and placed in the cache.
In most computer systems, the instructions and data at main memory are organized into units typically referred to as xe2x80x9cblocksxe2x80x9d or xe2x80x9ccache linesxe2x80x9d each of which is separately addressable. Instructions and data are typically moved about the computer system in terms of one or more blocks or cache lines. Furthermore, cache lines can be mapped into a cache in a number of different ways. The three most widely used mapping procedures are known as xe2x80x9cfully associative mappingxe2x80x9d, xe2x80x9cdirect mappingxe2x80x9d and xe2x80x9cset-associative mappingxe2x80x9d. With fully associative mapping, a cache line can be placed in any location or entry of the cache. With direct mapping, every cache line maps to a single, specific cache entry. With set-associative mapping, a particular cache line can be placed in a restrictive, predetermined set of entries within the cache. Thus, a cache line is first mapped into a particular set, and is then placed in any available or free cache entry within that set.
Due to the significant advantages that they provide, nearly all, if not all, computer systems include caches. In fact, many computer systems have multiple caches organized into levels in order to improve their efficiency even further. A multilevel cache system may include, for example, two cache memory modules disposed between the CPU and the main memory. A first level (L1) cache may be directly coupled to the CPU. The L1 cache is typically very small, but very fast. A second level (L2) cache may be disposed tbetween the L1 cache and the main memory. The L2 cache is typically larger but somewhat slower than the L1 cache, although it is still faster than accessing main memory.
In most multilevel cache systems, the contents of each cache level are also present at each lower level. That is, all of the instructions and/or data at the L1 cache are also present in the L2 cache. Each higher level cache (e.g., L2), however, typically contains addition information than the preceding level (e.g., L1). This arrangement, which is known as the xe2x80x9csubset rulexe2x80x9d, is used in order to speed up searches of the cache levels typically initiated by other CPUs in multiprocessor computer systems. More specifically, if a first CPU issues a request to main memory for a particular cache line, a search may also be made of the caches associated with the other CPUs in the computer system. If the cache hierarchy associated with each CPU follows the subset rule, a search need only be made of the highest cache level for each CPU. If the cache line is not present at the highest level cache, then it cannot be present in any of the lower levels. The computer system can thus locate particular cache lines relatively quickly. If each cache level for each CPU had to searched, the performance of the computer system could be severely degraded.
Before issuing a request to main memory, a CPU first searches all of its cache levels for the desired cache line. In particular, each cache level is searched until either a hit is obtained or a miss occurs at the highest level. That is, a search is first made of the highest level cache (e.g., L1). If there is no hit at the L1 cache, a search is made of the next level cache (e.g., L2) and so on. If there is no cache hit following the search of the highest level cache, then a request for the desired cache line is issued to main memory. As indicated above, the highest cache level of the other CPUs may also be searched for the requested cache line. If the cache line is not present at any of the caches, it is fetched from main memory and sent to the requesting CPU.
Because more than one CPU may request a copy of the same cache line from main memory, cache coherency protocols have been developed to ensure that no CPU relies on a cache line that has become stale, typically due to a change or update performed to the cache line by some other CPU. Many cache coherency protocols associate a state with each cache line. A given cache line, for example, may be in a shared state in which copies of the cache line may be present in the cache hierarchies associated with multiple CPUs. When a cache line is in the shared state, a CPU may read from, but not write to, the respective cache line. To support write operations, a cache line may be in an exclusive state. In this case, the cache line is owned by a single CPU which may write to the cache line.
When a CPU wishes to obtain exclusive ownership over a cache line that is currently in the shared state (i.e., copies of the cache line are present in the cache hierarchies of other CPUs), invalidate requests are typically issued to those other CPUs. When an invalidate request is received by a given CPU, each level of the respective cache hierarchy is searched for the cache line specified by the invalidate request. If the specified cache line is found, it is transitioned to an invalid state. Many caches assign or associate a valid bit with each cache line stored in the cache. If the bit is asserted, then the cache line is considered to be valid and may be accessed and processed by the CPU. When a is cache line is initially received from main memory, the valid bit is typically asserted and the cache line is stored in the cache. When an invalidate request is received, the valid bit of the respective cache line is deasserted, thereby reflecting that the cache line is no longer valid.
Suppose, after invalidating a cache line within its hierarchy, a CPU issues its own request for this same cache line (requesting either shared or exclusive ownership). As described above, each level of the cache hierarchy is searched for the cache line starting at the lowest cache level. Because the state of the cache line has been transitioned to the invalid state, however, a cache miss is returned by each level. That is, when the valid bit associated with a given cache line is deasserted, there can be no match to the cache line and a search of the next highest level is performed. This process is repeated until a cache miss is returned from the highest cache level. Because the cache line is invalidated from each level in which it was originally present, a miss will occur at all levels of the cache hierarchy. At this point, a request is issued to main memory and/or the other CPUs of the computer system for the respective cache line.
This arrangement of having multilevel caches and including within each higher level cache all of the information from the next lower cache has proven very useful and is now incorporated into many computer system designs. For computer systems having many cache levels, however, the searching of each level can consume a significant amount of time, during which the CPU often waits in an inactive or idle state.
Briefly, the invention relates to a system for adaptively bypassing one or more higher cache levels following a miss in a lower cache level of a cache hierarchy. Each cache level is preferably configured to store information, such as data and/or instructions, to be accessed or operated upon by a processor or entity that is associated with the cache hierarchy. The information within each cache level is preferably organized into lines or blocks and each cache level includes a tag store. The tag store contains an address and a state for each cache line resident in the respective cache. A cache line may be in any number of states, including a shared state, in which the cache line can be owned by multiple processors, and an exclusive state in which the cache line is owned by only a single processor. When a processor wishes to obtain exclusive ownership of a given cache line that is in the shared state, invalidate requests are sent to those other processors that have a shared copy of the cache line.
When an invalidate request is received at a given cache hierarchy, each cache level is searched for the address specified by the invalidate request. When an address match is detected, the state of the respective cache line is changed to the invalid state. The address of the cache line, however, is left in the tag store. Thereafter, if the processor or entity issues its own request for this same cache line, the cache hierarchy begins searching the tag store of each level starting with the lowest cache level. Since the address of the invalidated cache line was left in the respective tag store, a match will be detected for the address of the requested cache line at one of the cache levels, although the corresponding state of this cache line is invalid. In accordance with the present invention, such a distinction (i.e., an address hit, but an invalid state) is specifically detected by the cache hierarchy and is considered to be an xe2x80x9cinval_missxe2x80x9d occurrence. In response to an inval_miss, the cache hierarchy calls off searching any higher cache levels, and instead, issues a request to main memory for the desired cache line. That is, the searching of higher cache levels is bypassed upon detecting an address match of an invalid cache line. Accordingly, time is not wasted searching the higher cache levels.
In a further aspect of the present invention, invalidate requests include the identity of the source processor or entity that is seeking exclusive ownership of the respective cache line. When such an invalidate request is received and a match is detected in a given cache level for the cache line specified by the invalidate request, the cache hierarchy not only changes the state of the cache line to invalid, it also stores the source processor identifier in the respective cache line. In other words, the cache hierarchy overwrites the cache line, or a portion thereof, with the source processor identifier. Since the state of the cache line is invalid, the cache line is no longer reliable and thus of no use to the processor. Thereafter, if the processor or entity associated with the cache hierarchy requests this same cache line and an inval_miss condition is detected, the source processor identifier that was stored in the cache line is retrieved. The cache hierarchy then issues a request for the cache line directly to the identified source processor, either in addition to or in place of the request that is sent to main memory.