1. Field of the Invention
The present invention relates generally to enhancing performance of computer processors, and more particularly to methods for reducing the redundant storage of data in caches of chip multiprocessors (CMPs).
2. Description of Related Art
A conventional chip multiprocessor (CMP) is a computer processor composed of two or more single-threaded or multi-threaded processor cores on a single chip. Typically each processor core of the CMP includes at least one first level cache, herein referred to as an L1 cache, and/or a core cache. An L1 cache can be further subdivided into L1 sub-caches, such as an instruction (I) cache and a data (D) cache.
The processor cores typically share a single second level cache, herein referred to as a shared L2 cache, also on the chip. The shared L2 cache allows for data communication and data sharing between threads running on different processor cores. Some shared L2 caches are further subdivided into L2 sub-caches, sometimes referred to as banks. Typically, communication occurs between the L1 caches of the processor cores and the shared L2 cache via a crossbar.
Where a shared L2 cache is banked, the crossbar determines the bank to be accessed in the shared L2 cache.
A cache, such as an L1 cache and a shared L2 cache, is a memory structure that stores data for use by the CMP. As used herein the term data refers to program data, and to program instructions. Typically a cache is smaller in storage capacity than a main memory of a computer system, and stores copies of data and instructions from main memory that are more frequently used by a CMP. As a cache is usually closer to the processor core than a main memory of a computer system, the data in the cache is typically accessed more quickly than an access of the same data from main memory. For example, in a conventional CMP, the L1 caches and the shared L2 cache are typically on the same chip allowing for faster data access than an access of the same data from main memory.
Data stored in a cache is typically stored in a data store area of the cache, and the stored data is commonly referred to as a data line. The cache further includes a cache directory that includes one or more cache directory entries that individually reference a different data line stored in the cache.
In conventional CMPs, each data line stored in an L1 cache has an associated L1 cache directory entry in the L1 cache directory that identifies the data line and where the data line is stored in the L1 data store of the L1 cache. Similarly, each data line stored in a shared L2 cache has an associated L2 cache directory entry in the shared L2 cache directory that identifies the data line and where the data line is stored in the shared L2 cache. Thus, in a conventional shared L2 cache of a conventional CMP, there is an associative one to one mapping of each L2 cache directory entry to a different associated data line stored in the L2 cache.
An L2 cache directory entry in a shared L2 cache of a conventional CMP typically includes a memory coherence protocol (MCP) value followed by a tag value identifying a particular data line. The MCP value, for example, one or more bits, indicates one or more memory states of the associated data line in accordance with a particular cache memory coherence protocol. Examples of memory coherence protocols include MOESI, MSI, MESI, and MOSI protocols. Memory coherence protocols are well known to those of skill in the art. The tag value, for example, forty (40) bits, identifies a data line and the location of the data line in the shared L2 cache data store.
When a process is executed by a conventional CMP, at least one of the processor cores of the CMP typically requests a read access or a write access to data. When a read access request is issued, typically the requesting processor core requests a data line from the processor core's L1 cache. If the data line is present in the L1 cache, commonly called an L1 cache hit, the data line is returned to the requesting processor core. Otherwise, if the data line is not present in the L1 cache, commonly called an L1 cache miss, the L1 cache requests the data line from the shared L2 cache.
If the data line is present in the L2 cache, commonly called an L2 cache hit, the data line is returned to the requesting L1 cache. Otherwise, if the data line is not present in the shared L2 cache, commonly called an L2 cache miss, the shared L2 cache requests the data line from an off chip source, such as from a lower level cache, e.g., an L3 cache, if present, or from the main memory of the computer system. When the data line is obtained, the data line is returned to the shared L2 cache, and then to the requesting L1 cache.
Currently when a data line is obtained for a processor core in response to a read access request, the data line is stored in the L1 cache of the requesting processor core, and the data line is also stored in the shared L2 cache regardless of whether that data line is used only by the requesting processor core or by other processor cores.
Thus, in conventional CMP designs, each processor core can retain private data in the shared L2 cache in addition to retaining the private data in the processor core's own L1 cache. Herein a data line that is used by one or more threads on a particular processor core, but not used by any of the threads on any of the other processor cores, is termed private data, or data private to that particular processor core. A data line that is used by one or more threads on more than one processor core is termed shared data.
Consequently, competition for storage space in the shared L2 cache increases as private data of one processor core competes with private data of another processor core for the limited space in the shared L2 cache. This competition for storage space in the shared L2 cache can lead to an increase in the L2 cache miss rate if there is not enough storage space for a requested data line in the shared L2 cache.
Further, a processor core that issues many unused prefetches of data can pollute the shared L2 cache with storage of unused data and displace the storage of more useful data for other processor cores from the shared L2 cache, again leading to an increase in the L2 cache miss rate. An increase in the L2 cache miss rate in turn leads to an increase in off-chip bandwidth usage to retrieve the requested data, such as from an L3 cache or from main memory, which can lead to an increase in the L2 cache miss latency. Increases in the L2 cache miss rate and in the L2 cache latency are usually highly detrimental to a CMP's performance.