1. Field of the Invention
The present invention relates the design of multiprocessor computer More specifically, the present invention relates to a method and an apparatus for performing multiple cache line invalidations at the same time.
2. Related Art
In order to achieve high rates of computational performance, computer system designers are beginning to employ multiple processors that operate in parallel to perform a single computational task. One common multiprocessor design includes a number of processors 151-154 with a number of level one (L1) caches 161-164 that share a single level two (L2) cache 180 and a memory 183 (see FIG. 1A). During operation, if a processor 151 accesses a data item that is not present in its local L1 cache 161, the system attempts to retrieve the data item from L2 cache 180. If the data item is not present in L2 cache 180, the system first retrieves the data item from memory 183 into L2 cache 180, and then from L2 cache 180 into L1 cache 161.
Note that coherence problems can arise if a copy of the same data item exists in more than one L1 cache. In this case, modifications to a first version of a data item in L1 cache 161 may cause the first version to be different than a second version of the data item in L1 cache 162.
In order to prevent coherency problems, computer systems often provide a coherency protocol that operates across bus 170. A coherency protocol typically ensures that if one copy of a data item is modified in L1 cache 161, other copies of the same data item in L1 caches 162-164, in L2 cache 180 and in memory 183 are updated or invalidated to reflect the modification.
Coherence protocols typically perform invalidations by broadcasting invalidation messages across bus 170. If such invalidations occur frequently, these invalidation messages can potentially tie up bus 170, and can thereby degrade overall system performance.
In order to remedy this problem, some designers have begun to explore the possibility of maintaining directory information within L2 cache 180. This directory information specifies which L1 caches contain copies of specific data items. This allows the system to send invalidation information to only the L1 caches that contain the data item, instead of sending a broadcast message to all L1 caches. (This type of system presumes that there exist separate communication pathways for invalidation messages to each of the L1 caches 161-164. These communication pathways are not present in the system illustrated in FIG. 1A.) Note that if more communication pathways are provided between LI caches 161-164 and L2 cache 180, it becomes possible for multiple processors to perform accesses that cause invalidations at the same time. Hence, L1 caches 161-164 may receive multiple invalidation requests at the same time.
What is needed is a method and an apparatus that facilitates performing multiple invalidations at an L1 cache at the same time.
Furthermore, note that L1 caches 161-164 are typically set-associative. Hence, when an invalidation message is received by L1 cache 161, a lookup and comparison must be performed in L1 cache 161 to determine the way location of the data item. For example, in a four-way set-associative L1 cache, a data item that belongs to a specific set can be stored in one of four possible xe2x80x9cwaysxe2x80x9d. Consequently, tags from each of the four possible ways must be retrieved and compared to determine the way location of the data item. This lookup is time-consuming and can degrade system performance.
Hence, what is needed is a method and an apparatus for invalidating an entry in an L1 cache without performing a lookup to determine the way location of the entry.
One embodiment of the present invention provides a multiprocessor system that supports multiple cache line invalidations within the same cycle. This multiprocessor system includes a plurality of processors and a lower-level cache that is configured to support multiple concurrent operations. It also includes a plurality of higher-level caches coupled to the plurality of processors, wherein a given higher-level cache is configured to support multiple concurrent invalidations of lines within the given higher-level cache.
In one embodiment of the present invention, the lower-level cache includes a plurality of banks that can be accessed in parallel to support multiple concurrent operations.
In a variation on the above embodiment, the multiprocessor system includes a switch that is configured to couple the plurality of banks of the lower-level cache with the plurality of higher-level caches.
In a variation on the above embodiment, each line in a given higher-level cache includes a valid bit that can be used to invalidate the line. These valid bits are contained in a memory that is organized into a plurality of banks that are associated with the plurality of banks of the lower-level cache. Moreover, each bank containing valid bits is hardwired to an associated bank of the lower-level cache, so that the given higher-level cache can receive multiple concurrent invalidation signals from the lower-level cache.
In a variation on this embodiment, each bank containing valid bits includes a first port and a second port, wherein the first port can be used to read or write a first location in the bank while the second port is used to invalidate a second location in the bank. This can be accomplished by providing each bank containing valid bits with its own decoder that selects a wordline for the bank""s second port, and by sharing a single decoder that selects a single wordline across all the banks. In a further variation, a wordline of the second port causes a memory element to be reset without coupling the memory element to a corresponding bitline.
In one embodiment of the present invention, a given invalidation signal received by a given higher-level cache includes, a set location of a line to be invalidated in the given higher-level cache, and a way location of the line to be invalidated in the given higher-level cache.
In one embodiment of the present invention, the multiprocessor system is located on a single semiconductor chip.
In one embodiment of the present invention, the lower-level cache is an L2 cache, and each of the plurality of higher-level caches is an L1 cache.
In one embodiment of the present invention, the plurality of higher-level caches are organized as write-through caches, so that updates to the plurality of higher-level caches are immediately written through to the lower-level cache.