1. Field of the Invention
The present invention relates to the design of cache memories in computer systems. More specifically, the present invention relates to a method and an apparatus for facilitating flow control in order to support pipelined accesses to and from a cache memory.
2. Related Art
As microprocessor clock speeds continue to increase at an exponential rate, it is becoming increasingly harder to provide sufficient data transfer rates between functional units on a microprocessor chip. For example, data transfers between a Level Two (L2) cache and a Level One (L1 cache) can potentially require a large number of processor clock cycles. Moreover, the processor will be severely underutilized if it has to wait a large number of clock cycles to complete each access to L2 cache. Hence, in order to keep the processor busy, it is necessary to pipeline data transfers between L2 cache and the processor.
However, pipelining introduces problems. In a pipelined architecture, a number of accesses from the processor to the L2 cache can potentially be in flight at any given time. Furthermore, service times for accesses to the L2 cache are unpredictable because each access can potentially cause a cache miss if the desired data item is not present in L2 cache. Hence, what is needed is a mechanism for halting subsequent accesses to the L2 cache, as well as a mechanism for queuing in flight transactions in case preceding accesses generate time-consuming cache misses.
Additionally, there are limitations on the number of outstanding cache misses that can be pending at any given time. Caches are typically designed with a set-associative architecture that uses a number of address bits from a request to determine a xe2x80x9csetxe2x80x9d to which the request is directed. A set-associative cache stores a number of entries for each set, and these entries are typically referred to as xe2x80x9cwaysxe2x80x9d. For example, a four-way set-associative cache contains four entries for each set. This means that a four-way set associative cache essentially provides a small four-entry cache for each set.
Note that it is desirable not to allow more than four outstanding miss operations to be pending on any given set in a four-way set-associative cache. For example, if a system allows five outstanding misses, the five misses could potentially return at about the same time, and there would only be room to accommodate four of them. In this case, one of the returned cache lines would immediately be kicked out of the cache. Dealing with this problem can greatly complicate the design of a cache. Hence, what is needed is a mechanism for halting subsequent accesses to the L2 cache when a given set has too many pending miss operations.
One embodiment of the present invention provides a system that facilitates flow control to support pipelined accesses to a cache memory. When an access to the cache memory generates a miss, the system increments a number of outstanding misses that are currently in process for a set in the cache to which the miss is directed. If the number of outstanding misses is greater than or equal to a threshold value, the system stalls generation of subsequent accesses to the cache memory until the number of outstanding misses for each set in the cache memory falls below the threshold value. Upon receiving a cache line from a memory subsystem in response to an outstanding miss, the system identifies a set that the outstanding miss is directed to. The system then installs the cache line in an entry associated with the set. The system also decrements a number of outstanding misses that are currently in process for the set. If the number of outstanding misses falls below the threshold value as a result of decrementing, and if no other set has a number of outstanding misses that is greater than or equal to the threshold value, the system removes the stall condition so that subsequent accesses can be generated for the cache memory.
In one embodiment of the present invention, the system determines whether to remove the stall condition by examining a state machine. This state machine keeps track of a number of outstanding misses that cause sets in the cache memory to meet or exceed the threshold value.
In one embodiment of the present invention, the system additionally replays the access that caused the cache line to be retrieved.
In one embodiment of the present invention, the system increments the number of outstanding misses that are currently in process for the set by setting a prior miss bit that is associated with an entry for a specific set and way in the cache memory. This prior miss bit indicates that an outstanding miss is in process and will eventually fill the entry for the specific set and way. In a variation on this embodiment, the prior miss bit is stored along with a tag for the specific set and way, so that a tag lookup returns the prior miss bit.
In one embodiment of the present invention, the cache memory is a Level Two (L2) cache and the access is received from a Level One (L1) cache.
In one embodiment of the present invention, receiving the access involves receiving the access from a queue located at the L2 cache, wherein the queue contains accesses generated by the L1 cache. In this embodiment, the system uses credit-based flow control to limit sending of accesses from the L1 cache into the queue, so that the queue does not overflow.
In one embodiment of the present invention, the L2 cache receives accesses from a plurality of L1 caches.
In one embodiment of the present invention, the threshold value is less than a number of entries in the cache memory associated with each set. This effectively reserves one or more additional entries for each set to accommodate in-flight accesses that have been generated but not received at the cache memory.
FIG. 1 illustrates a multiprocessor system in accordance with an embodiment of the present invention.
FIG. 2 illustrates in more detail the multiprocessor system illustrated in FIG. 1 in accordance with an embodiment of the present invention.
FIG. 3 illustrates the structure of an L2 bank in accordance with an embodiment of the present invention.
FIG. 4 illustrates status bits and a tag associated with an L2 cache entry in accordance with an embodiment of the present invention.
FIG. 5 illustrates an exemplary pattern of pending miss operations in accordance with an embodiment of the present invention.
FIG. 6 illustrates a state diagram for pending miss operations in accordance with an embodiment of the present invention.
FIG. 7 is a flow chart illustrating processing of a cache access in accordance with an embodiment of the present invention.
FIG. 8 is a flow chart illustrating processing of a cache line return in accordance with an embodiment of the present invention.