1. Field of the Invention
The present invention relates the design of multiprocessor systems, More specifically, the present invention relates to a method and an apparatus for using a reverse directory located at a lower-level cache to facilitate operations involving higher-level caches that perform accesses through the lower-level cache.
2. Related Art
In order to achieve high rates of computational performance, computer system designers are beginning to employ multiple processors that operate in parallel to perform a single computational task. One common multiprocessor design includes a number of processors 151-154 coupled to level one (L1) caches 161-164 that share a single level two (L2) cache 180 and a memory 183 (see FIG. 1A). During operation, if a processor 151 accesses a data item that is not present in local L1 cache 161, the system attempts to retrieve the data item from L2 cache 180. If the data item is not present in L2 cache 180, the system first retrieves the data item from memory 183 into L2 cache 180, and then from L2 cache 180 into L1 cache 161.
Note that coherence problems can arise if a copy of the same data item exists in more than one L1 cache. In this case, modifications to a first version of a data item in L1 cache 161 may cause the first version to be different than a second version of the data item in L1 cache 162.
In order to prevent coherency problems, computer systems often provide a coherency protocol that operates across bus 170. A coherency protocol typically ensures that if one copy of a data item is modified in L1 cache 161, other copies of the same data item in L1 caches 162-164, in L2 cache 180 and in memory 183 are updated or invalidated to reflect the modification.
Coherence protocols typically perform invalidations by broadcasting invalidation messages across bus 170. If such invalidations occur frequently, these invalidation messages can potentially tie up bus 170, and can thereby degrade overall system performance.
In order to remedy this problem, some designers have begun to explore the possibility of maintaining directory information within L2 cache 180. This directory information specifies which L1 caches contain copies of specific data items. This allows the system to send invalidation information to only the L1 caches that contain the data item instead of sending a broadcast message to all L1 caches. (This type of system presumes that there exist separate communication pathways for invalidation messages to each of the L1 caches 161-164, unlike the example illustrated in FIG. 1A, which uses a single shared bus 170 to communicate with L1 caches 161-164.)
However, note that storing directory information for each entry in L2 cache 180 is wasteful because L2 cache 180 typically has many more entries than L1 caches 161-164. This means that most of the entries for directory information in L2 cache 180 will be empty.
Furthermore, note that L1 caches 161-164 are typically set-associative. Hence, when an invalidation message is received by L1 cache 161, a lookup and comparison must be performed in L1 cache 161 to determine the way location of the data item. For example, in a four-way set-associative L1 cache, a data item that belongs to a specific set (that is specified by a portion of the address) can be stored in one of four possible xe2x80x9cwaysxe2x80x9d. Consequently, tags from each of the four possible ways must be retrieved and compared to determine the way location of the data item. This lookup is time-consuming and can degrade system performance.
What is needed is a method and an apparatus for maintaining directory information for L1 caches without wasting memory.
Furthermore, what is needed is a method and an apparatus for invalidating an entry in an L1 cache without performing a lookup to determine the way location of the entry.
One embodiment of the present invention provides a multiprocessor system that includes a number of processors with higher-level caches that perform memory accesses through a lower-level cache. This multiprocessor system also includes a reverse directory coupled to the lower-level cache, which includes entries corresponding to lines in the higher-level caches, wherein each entry identifies an associated entry in the lower-level cache.
In one embodiment of the present invention, the lower-level cache is configured to receive a request from a higher-level cache to retrieve a line from the lower-level cache. If the line is present within the lower-level cache, the system sends the line to the higher-level cache so that the line can be stored in the higher-level cache. The system also stores information in the reverse directory to indicate that the line is stored in the higher-level cache.
In a variation on this embodiment, the higher-level cache is an N-way set-associative cache, and storing the information in the reverse directory involves storing way information identifying a way location in the higher-level cache in which the line is to be stored. The multiprocessor system is additionally configured to use this way information during a subsequent invalidation operation to invalidate the line in the higher-level cache without having to perform a lookup in the higher-level cache to determine the way location of the line in the higher-level cache.
In one embodiment of the present invention, the lower-level cache is additionally configured to generate a miss to pull the line into the lower-level cache, if the line is not present within the lower-level cache.
In one embodiment of the present invention, upon receiving an update request that causes a target entry in the lower-level cache to be updated, the system performs a lookup in the reverse directory to determine if the target entry is contained in one or more higher-level caches. For each higher-level cache that contains the target entry, the system sends an invalidation request to the higher-level cache to invalidate the target entry, and updates a corresponding entry in the reverse directory to indicate that the target entry has been invalidated in the higher-level cache.
Note that this update request can include, a load miss, a store miss, and a store hit on the target entry. If the update request is a store hit, the lookup in the reverse directory involves looking up the target entry in all higher-level caches, except for a higher-level cache that caused the store hit.
In one embodiment of the present invention, the reverse directory includes a fixed entry corresponding to each entry in each of the higher-level caches.
In one embodiment of the present invention, each entry in the reverse directory includes information specifying a location of a corresponding entry in the lower-level cache.
In one embodiment of the present invention, the lower-level cache is organized as an M-way set associative cache. In this embodiment, each entry in the reverse directory includes: a way identifier that identifies a way location of a corresponding entry within the lower-level cache; a set identifier that identifies a set location of the corresponding entry within the lower-level cache, wherein the set identifier does not include set information that can be inferred from a location of the entry within the reverse directory; and a valid flag indicating whether the entry in the reverse directory is valid.
In one embodiment of the present invention, the multiprocessor system is located on a single semiconductor chip.
In one embodiment of the present invention, the lower-level cache is an L2 cache, and each of the higher-level caches is an L1 cache.
In one embodiment of the present invention, the higher-level caches are organized as write-through caches, so that updates to the higher-level caches are immediately written through to the lower-level cache.
In one embodiment of the present invention, the lower-level cache includes multiple banks that can be accessed in parallel.