1. Technical Field
The present invention relates generally to memories in computer processing systems and, in particular, to a method and apparatus for history-based movement of shared-data in coherent cache memories.
2. Background Description
With respect to prefetching in coherent cache memories, the prior art corresponding thereto uses history information to help prefetch invalidated cache lines in a node/processor before the invalidated cache lines are used again. For example, the Cosmos xe2x80x9ccoherence message predictorxe2x80x9d and the xe2x80x9cmemory sharing predictorxe2x80x9d, predict the source and type of coherence messages for a cache line in a multiprocessor (MP) computer system using a complex prediction logic. The Cosmos coherence message predictor is described by Hill et al., in xe2x80x9cUsing Prediction to Accelerate Coherence Protocolsxe2x80x9d, Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), Barcelona, Spain, June 27 through Jul. 2, 1998, pp. 179-90. The memory sharing predictor is described by Falsafi et. al, in xe2x80x9cMemory Sharing Predictor: The Key to a Speculative Coherent DSMxe2x80x9d, Proceedings of the 26th International Symposium Computer. Architecture (ISCA), Atlanta, Ga., May 2-4, 1999, pp. 172-83.
The goal of the two preceding predictors is to predict incoming messages that affect memory blocks and to timely execute the incoming messages. Thus, the two approaches are basically similar in that they are both based on the ability to predict likely messages in sequence and speculatively execute the messages. However, the two approaches suffer from a long learning overhead which is required to accurately predict xe2x80x9cfollow-onxe2x80x9d messages.
Accordingly, it would be desirable and highly advantageous to have a method and apparatus for moving data in coherent cache memories which does not suffer from a long learning time overhead and yet can still predict necessary data movements and move such data to other processors early enough to avoid long latency cache misses and excessive coherence traffic.
The problems stated above, as well as other related problems of the prior art, are solved by the present invention, a method and apparatus for history-based movement of shared-data in coherent cache memories.
The invention reduces data misses and access latencies in coherent cache memories in a multiprocessor (MP) computer system. The invention employs mechanisms for causing the pushing of data from one node/processor to other nodes/processors in an MP computer system based on memory sharing patterns. The concept is to monitor and keep a history of which processors in the system consume a particular datum or data (hereinafter xe2x80x9cdataxe2x80x9d) that is produced by another node/processor, and push the produced data (e.g., cache line(s)) to the consuming nodes/processors. Cache lines may be pushed from one cache to one or more other caches at other nodes/processors. Thus, the invention can reduce the latency associated with lateral interventions.
In contrast to the prior art, the invention is not concerned with the sequence in which messages arrive, and does not affect the way messages get executed. The invention simply identifies nodes/processors that are likely to use newly produced data, and attempts to timely move copies of the data closer to the consuming nodes/processors to reduce possible cache misses that must be intervened by other nodes/processors in the system. The invention uses a simple history gathering technique, and does not require a complex prediction mechanism to predict likely messages that must follow in series. The invention employs a data-centric approach that actively and aggressively moves data closer to the consuming processor(s), and incurs only a minimal learning overhead, in contrast to the prior art.
According to a first aspect of the invention, there is provided a method for moving at least one of instructions and operand data throughout a plurality of caches included in a computer system, wherein each of the plurality of caches is included in one of a plurality of processing nodes of the system. The method includes the step of storing a plurality of entries in a table attached to each of the plurality of caches, wherein each of the entries is associated with a plurality of storage elements in one of the plurality of caches and includes information of prior usage of the plurality of storage elements by each of the plurality of processing nodes.
According to a second aspect of the invention, there is provided a method for moving at least one of instructions and operand data throughout a plurality of caches included in a computer system, wherein each of the plurality of caches is included in one of a plurality of processing nodes of the system. The method includes the step of storing a plurality of entries in a table attached to each of the plurality of caches, wherein each of the entries is associated with a plurality of storage elements in one of the plurality of caches and includes information of prior usage of the plurality of storage elements by each of the plurality of processing nodes. Upon a miss by a given processing node to a given cache included therein, any given storage elements that caused the miss are transferred to the given cache from one of main memory and another cache. A given entry that is associated with the given storage elements is created in the table.
According to a third aspect of the invention, the method further includes the step of displacing at least one existing storage element in the given cache to make room for the given storage elements.
According to a fourth aspect of the invention, the method further includes the step invalidating an entry associated with the displaced at least one existing storage element, if the entry exists.
According to a fifth aspect of the invention, upon a processing node performing a store operation that updates at least one storage element in a cache included in the processing node, the method further includes the step of searching the table for an entry associated with the at least one storage element. If the entry is found, the at least one storage element is requested to be transmitted to any processing nodes identified in the entry based upon the information stored in the entry.
According to a sixth aspect of the invention, upon a request by a processing node for a storage element in a cache included in another processing node, the method further includes the step of searching the table for an entry corresponding to the requested storage element. If the entry is found, the information stored in the entry is updated to indicate that the requested storage element has been sent to the other processing node. If the entry is not found, a new entry is created in the table for the requested storage element.
According to a seventh aspect of the invention, the creating step includes the step of identifying a usage of the requested storage element by the other processing node.
According to an eighth aspect of the invention, upon a receipt of a storage element by a processing node from another processing node that recently updated the storage element, the method further includes the step of determining whether the received storage element is to be stored in the cache included in the processing node, based upon a current content of the cache.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.