This invention relates generally to computer cache memories, and more particularly to a cache-coherence system and a method for allowing purging of mid-level cache entries without purging lower-level cache entries.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright(copyright) 2000, Silicon Graphics Incorporated, All Rights Reserved.
Parallel computer systems provide economic, scalable, and high-availability approaches to computing solutions. From the point of view of managing computer systems including parallel-processor systems, there is a need for a cache coherence system and control in order to obtain desired system operation.
Conventional hierarchical cache systems provide small fast cache memories next to fast information processing units, and larger slower memories that are further away in time and space. It is too expensive to make a fast memory large enough to hold all of the data for a large computer program, and when memories are made larger, the access times slow down and heat dissipation also becomes a problem.
Modern computer systems thus typically include a hierarchy of memory systems. For example, a processor might have an L0 cache on the same chip as a processor. This L0 cache is the smallest, perhaps 16 to 256 kilobytes (KB), and runs at the fastest speed since there are no chip-boundary crossings. An L1 cache might be placed next to the processor chip on the same chip carrier. This L1 cache is the next smallest, perhaps 0.5 to 8 megabytes (MB), and runs at the next fastest speed since there are chip-boundary crossings but no card-boundary crossings. An L2 cache, if implemented, might be placed next to the processor card in the same box but on a different chip carrier. This L2 cache is typically still larger than the L1 and runs at the next fastest speed since there are card-boundary crossings but no box-boundary crossings. A large main memory, typically implemented using RDRAMs (RAMBUS(trademark) dynamic random-access memories) or DDR SDRAMs (double-data-rate synchronous dynamic random-access memories) is then typically provided. Beyond that, a disc array provides mass storage at a slower speed than main memory, and a tape farm can even be provided to hold truly enormous amounts of data, accessible within seconds, minutes or hours. At each level moving further from the processor, there is typically a larger store running at a slower speed. For each level of storage, the level closer to the processor thus contains a proper subset of the data in the level further away. For example, in order to purge data in the main memory leaving that data in the disc storage, one must first purge all of the portions of that data that may reside in the L0, L1, and/or L2 levels of cache. Conventionally, this may not lead to any performance problems, since the processor is finished with the data by the time that the main memory is purged.
However, as more processors and more caches are added to a system, there can be more competition for scarce cache resources. There is a need to maintain coherence of data (i.e., ensuring that as data is modified, that all cached copies are timely and properly updated) among the various cache types, levels, and locations. Thus there is a need for improved methods and apparatus to improve system performance while also maintaining system integrity and cache coherence.
The present invention provides solutions to the above-described shortcomings in conventional approaches, as well as other advantages apparent from the description and appendices below.
The present invention provides a method and apparatus for purging data (e.g., a first cache line) from a middle cache level without purging the corresponding data from a lower cache level (i.e., a cache level closer to the processor using the data), and replacing the purged first data in the middle-level cache with other data (e.g., with another cache line) of a different memory address than the purged first data, while leaving the data of the first cache line in the lower cache level. In some embodiments, in order to allow such mid-level purging, the first cache line must be in the xe2x80x9cshared statexe2x80x9d that allows reading of the data, but does not permit modifications to the data. If it is desired to modify the data, a directory facility will issue a purge to all caches of the shared-state data for that cache line, and then the processor that wants to modify the data will request an exclusive-state copy to be fetched to its lower-level cache and to all intervening levels of cache. Later, when the data in the lower cache level is modified, the modified data can be moved back to the original memory from the caches.