1. Technical Field
The present invention relates generally to cache coherence in multiprocessor data processing systems, and more particularly to enhancing operation of caches with an algorithm for selecting a cache directory entry.
2. Discussion of the Background Art
A computer system node may be divided into a memory subsystem and a processor subsystem. The memory subsystem includes the main Dynamic Random Access Memory (DRAM) and provides data from memory in response to requests from any number of connected processors. Normally, the amount of time spent to access data in the memory subsystem is quite long relative to the processor""s speed and therefore processors are often built with caches to improve their performance. The processor subsystem includes the processors and one or more caches. A cache is a small memory, connected between the processor and main memory, that stores recently-used data from the main memory. A cache is much faster to access than the main memory subsystem, and is usually much smaller. The smallest unit of data that can be transferred into and out of a cache is called a cached xe2x80x9cline.xe2x80x9d The data in memory that corresponds to a cached line is called a memory line. A data line refers to either a cached line or a memory line.
All caching architectures divide main memory into physically consecutive segments comprising one or a series of memory lines, many of which correspond to a pluralities of cached lines. Accessing a cached line requires a segment tag to identify the segment that corresponds to the line and a line index to identify the line within the segment. Those skilled in the art will recognize that if a segment has only one line then a line index is not required. If a processor requests a data line that is already contained in the local cache, then that data line is delivered to the processor. Otherwise, the processor gets the data line from main memory.
Set-associative and fully associative caches are xe2x80x9cmultiplexe2x80x9d ways, meaning a directory entry references multiple cached lines that have the same memory segment index but are from different segments. This, compared to a direct-mapped cache, can improve the cache-hit rate because the multiple-way directory reduces contention between active cache lines that map to the same way. Direct mapping of cache lines avoids the question of selecting a directory to replace when the directory is needed to reference a newly requested cached line, but fully-associative and set-associative cache mapping schemes require a replacement protocol to select a directory referencing a particular cached line that should be replaced. The most popular protocol is the Least Recently Used (LRU) protocol, which replaces the cache line that has not been used for the longest time.
Typically, a set-associative cache is four- to eight-way while a fully-associative cache is thirty-two- to sixty-four-way.
In a shared-memory multiprocessor system, each processor usually has its own cache, so the system has multiple caches. Since each cache can hold a copy of a given data line, it is important to keep the states of all different cached lines consistent and up-to-date with the latest version written by any one of the processors. A memory subsystem is usually responsible for returning, from the caches or main memory, the correct value as prescribed by the processor""s memory model, which includes a cache-coherence protocol having a set of rules to govern the operation of caches.
To maintain cache coherence across the system, the cache-coherence protocol uses a directory that contains cache-coherence control information. The directory, usually part of the memory subsystem, has an entry for each main memory location with state information indicating whether the memory data may also exist in a cache elsewhere in the system. The coherence protocol specifies all transitions and transactions to be taken in response to a memory request. Any action taken on a cache line is reflected in the state stored in the directory. A common cache coherence scheme uses three permanent states to accomplish this:
Invalid: Line is not cached anywhere. Main memory has the only copy.
Shared: Line is valid in at least one cache at a remote node.
Dirty: Line is valid in one cache at a remote node. The copy may be modified by the processor in that remote node. The main memory may contain old data.
The coherence protocol may use other transient states to indicate that a line is in transition. Given enough time, these transient states revert to one of the above permanent states.
On every memory request from a processor, a memory subsystem must look at all cache tags to identify the segment that stores the memory line corresponding to the cached line. Each cache in a xe2x80x9csnoopy protocolxe2x80x9d can xe2x80x9csnoopxe2x80x9d every request and then signal to the memory subsystem if it has the most recent version of the cached line. Alternatively, the memory subsystem can keep a duplicate of each cache""s tags to find the location of the most recent version of the cached line. A duplicate tag-based method is sometimes called a xe2x80x9cdirectory based cache-coherence protocol.xe2x80x9d
FIG. 1 shows a prior art system 100 including multiple CPUs 102A, 102B, 102C, and 102D having respective local caches 110A, 110B, 110C, and 110D connected by a bus 118 to a memory controller 120 for the main DRAM memory 122. In this example, main memory 122 has, for each memory line, a space reserved for a directory 124 entry, and therefore wastes memory spaces because the total number of cached lines, which determines the number of entries in directory 124, is usually much smaller than the total number of memory lines in memory 122. Further, the cache coherence protocols for prior art system 100 are deficient in that, as the number of caches 110 and size of memory 122 increase, the size of directory 124 becomes objectionably large.
System 100 may be improved by using a sparse directory, which is a cache of directory entries. However, a replacement algorithm to find a directory entry for referencing a new cached line without regard to the state of the existing cached line can cause heavy data traffic between memory 122 and caches 110, and thus degrade system performance.
Therefore, what is needed is a replacement algorithm for use in a sparse directory that can solve the above deficiencies.
The present invention provides an algorithm to allocate a directory entry to store the state of a cached line in response to a memory request from a processor. The algorithm thus searches the directory for an entry. If at least one free entry is available, then the algorithm uses one of the available entries. Otherwise, the algorithm searches for a xe2x80x9csharedxe2x80x9d entry, and if at least one shared entry is found, then the algorithm uses preferably a xe2x80x9cleast recently usedxe2x80x9d (LRU) criteria to search among the available shared entries. Otherwise, the algorithm searches for a xe2x80x9cdirtyxe2x80x9d entry. If at least one dirty entry is found, then the algorithm uses preferably the LRU criteria to search among the available dirty entries. The algorithm uses an LRU criteria because entries that were allocated long ago and that have not been used recently are more likely to be stale. To increase system performance, the algorithm preferably searches for a shared entry before searching for a dirty entry.
These and other advantages of the invention will become apparent to those skilled in the art from the following detailed description and the accompanying drawings.