1. Field of the Invention
The present invention relates generally to enhancing the performance of computer processors, and more particularly to methods for reducing the redundant storage of data in caches of chip multiprocessors (CMPs).
2. Description of Related Art
A conventional chip multiprocessor (CMP) is a computer processor composed of two or more single-threaded or multi-threaded processor cores on a single chip. Typically each processor core of the CMP includes at least one first level cache, herein referred to as an L1 cache, and/or a core cache. An L1 cache can be further subdivided into L1 sub-caches, such as an instruction (I) cache and a data (D) cache.
The processor cores typically share a single second level cache, herein referred to as a shared L2 cache, also on the chip. The shared L2 cache allows for data communication and data sharing between threads running on different processor cores. Some shared L2 caches are further subdivided into L2 sub-caches, sometimes referred to as banks. Typically, communication occurs between the L1 caches of the processor cores and the shared L2 cache via a crossbar. Where a shared L2 cache is banked, the crossbar determines the bank to be accessed in the shared L2 cache.
A cache, such as an L1 cache and a shared L2 cache, is a memory structure that stores data for use by the CMP. As used herein the term data refers to program data, and to program instructions. Typically a cache is smaller in storage capacity than a main memory of a computer system, and stores copies of data and instructions from main memory that are more frequently used by a CMP.
As a cache is usually closer to the processor core than a main memory of a computer system, the data in the cache is typically accessed more quickly than an access of the same data from main memory. For example, in a conventional CMP, the L1 caches and the shared L2 cache are typically on the same chip allowing for faster data access than an access of the same data from main memory.
Data stored in a cache is typically stored in a data store area of the cache, and the stored data is commonly referred to as a data line or a cache line. The cache further includes a cache directory that includes one or more cache directory entries that individually reference a different data line stored in the cache.
In conventional CMPs, each data line stored in an L1 cache has an associated L1 cache directory entry in the L1 cache directory that identifies the data line and where the data line is stored in the L1 data store of the L1 cache. Similarly, each data line stored in a shared L2 cache has an associated L2 cache directory entry in the shared L2 cache directory that identifies the data line and where the data line is stored in the shared L2 cache. Conventionally, data that is used by a requesting processor core and not used by other processor cores, is termed private data, whereas data that is used by more than one processor core is termed shared data.
A conventional L1 cache directory entry in an L1 cache of a conventional CMP typically includes a valid value followed by a tag value. The valid value, for example, one or more bits, indicates whether the data line in the L1 cache is valid or not valid.
For example, a valid data line is a data line that is the current version or state of the data line, and can be used by a processor core. Conversely, an invalid data line is a data line that is not the current version or state of the data line, and cannot be used by the processor core without first updating the data line.
The tag value, for example, forty (40) bits, identifies a data line and the location of the data line in the L1 cache data store. Valid values and tag values in conventional L1 cache directory entries are well known to those of skill in the art and are not further described herein to avoid detracting from the principles of the present invention.
A conventional shared L2 cache directory entry in a conventional shared L2 cache of a conventional CMP typically includes a memory coherence protocol (MCP) value followed by a tag value identifying a particular data line.
The MCP value, for example, one or more bits, indicates one or more memory states of the associated data line in accordance with a particular cache memory coherence protocol. Examples of memory coherence protocols include MOESI, MSI, MESI, and MOSI protocols.
The tag value, for example, forty (40) bits, identifies a data line and the location of the data line in the shared L2 cache data store. Memory coherence protocols and tag values in conventional shared L2 cache directory entries are well known to those of skill in the art and are not further described herein to avoid detracting from the principles of the present invention.
Typically, conventional L1 caches are either write-through caches or write-back caches. If a requesting L1 cache is a conventional write-through cache, all data to be stored is written to the shared L2 cache. The requesting L1 cache has no ability to store the modified data.
The version of the data in the requesting L1 cache can be updated, but the data line is owned by and stored in the shared L2 cache. Thus, stored data is held in both the requesting L1 cache and in the shared L2 cache. When the stored data is private to the requesting L1 cache, the shared L2 cache is polluted with the private data.
Different from a write-through cache, if a requesting L1 cache is a conventional write-back cache, all data to be stored is initially written to the requesting L1 cache. The shared L2 cache may or may not have had a copy of the data, but the copy is an old copy as the newest copy is owned by and stored in the requesting L1 cache.
If another processor core needs the stored data, the other processor core has to obtain the data from the storing L1 cache via the shared L2 cache. Thus, the data stored in the L1 cache is now shared data and a requesting L1 cache, must transact through the shared L2 cache to obtain the data, and further the shared L2 cache is polluted with old copies of the data.
Thus, in conventional CMP designs, each processor core can retain private data in the shared L2 cache in addition to retaining the private data in the processor core's own L1 cache. Consequently, competition for storage space in the shared L2 cache increases as private data of one processor core competes with private data of another processor core for the limited space in the shared L2 cache. This competition for storage space in the shared L2 cache can lead to an increase in the L2 cache miss rate if there is not enough storage space for a requested data line in the shared L2 cache.
Further, a processor core that issues many unused prefetches of data can pollute the shared L2 cache with storage of unused data and displace the storage of more useful data for other processor cores from the shared L2 cache, again leading to an increase in the L2 cache miss rate. An increase in the L2 cache miss rate in turn leads to an increase in off-chip bandwidth usage to retrieve the requested data, such as from an L3 cache or from main memory, which can lead to an increase in the L2 cache miss latency. Increases in the L2 cache miss rate and in the L2 cache latency are usually highly detrimental to a CMP's performance.
As most stores of data are of data that is private to a strand, the current protocols are wasteful of on-chip resources. Further, as all stores in each strand and each core conventionally go through the shared L2 cache, a growing amount of transaction pressure is placed on the cross bar and the shared L2 cache.