This invention relates generally to computer cache memories, and more particularly to a cache-coherence system and a method for converting cache line types from a first type used on a multiprocessor system portion and a second cache line type used at each processor.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright(copyright) 2000, Silicon Graphics Incorporated, All Rights Reserved.
Parallel computer systems provide economic, scalable, and high-availability approaches to computing solutions. From the point of view of managing computer systems including parallel-processor systems, there is a need for a cache coherence system and control in order to obtain desired system operation.
Conventional hierarchical cache systems provide small fast cache memories next to fast information processing units, and larger slower memories that are further away in time and space. It is too expensive to make a fast memory large enough to hold all of the data for a large computer program, and when memories are made larger, the access times slow down and heat dissipation also becomes a problem.
Modem computer systems thus typically include a hierarchy of memory systems. For example, a processor might have an L0 cache on the same chip as a processor. This L0 cache is the smallest, perhaps 16 to 256 kilobytes (KB), and runs at the fastest speed since there are no chip-boundary crossings. An L1 cache might be placed next to the processor chip on the same chip carrier. This L1 cache is the next smallest, perhaps 0.5 to 8 megabytes (MB), and runs at the next fastest speed since there are chip-boundary crossings but no card-boundary crossings. An L2 cache, if implemented, might be placed next to the processor card in the same box but on a different chip carrier. This L2 cache is typically still larger than the L1 and runs at the next fastest speed since there are card-boundary crossings but no box-boundary crossings. A large main memory, typically implemented using RDRAMs (RAMBUS(trademark) dynamic random-access memories) or DDR SDRAMs (double-data-rate synchronous dynamic random-access memories) is then typically provided. Beyond that, a disc array provides mass storage at a slower speed than main memory, and a tape farm can even be provided to hold truly enormous amounts of data, accessible within seconds, minutes or hours. At each level moving further from the processor, there is typically a larger store running at a slower speed. For each level of storage, the level closer to the processor thus contains a proper subset of the data in the level further away. For example, in order to purge data in the main memory leaving that data only in the disc storage, one must first purge all of the portions of that data that may reside in the L0, L1, and/or L2 levels of cache. Conventionally, this may not lead to any performance problems, since the processor is finished with the data by the time that the main memory is purged.
However, as more processors and more caches are added to a system, there can be more competition for scarce cache resources. There is a need to maintain coherence of data (i.e., ensuring that as data is modified, that all cached copies are timely and properly updated) among the various cache types, levels, and locations. Thus there is a need for improved methods and apparatus to improve system performance while also maintaining system integrity and cache coherence.
The present invention provides solutions to the above-described shortcomings in conventional approaches, as well as other advantages apparent from the description and appendices below.
The present invention provides a method and apparatus for converting from a system-level cache line (e.g., in one embodiment, a (128)-byte directory-based cache coherence model) to a different processor-level cache line (e.g., in one embodiment, a (64)-byte, snoop-based cache-coherence model).