1. Technical Field
The present invention relates in general to a field of computers, and in particular to accessing computer system memory. Still more particularly, the present invention relates to a method and system for dynamically adjusting speculative retrieval of data stored in system memory.
2. Description of the Related Art
Processors in a multi-processor computer system typically share system memory, which may be either in multiple private memories associated with specific processors, or in a centralized memory, in which memory access is the same for all processors. For example, FIG. 1 illustrates a multi-processor computer 100 utilizing a centralized memory system sometimes referred to as a “dance hall,” in which processors 102 are on one “side” of a data bus 116 and system memories 114 are on the other “side” of the data bus 114. When a processor, such as processor 102a requires data from memory, it first checks its own L1 cache 104a and L2 cache 106a. If the data is not in either local cache, then a request is put out onto data bus 116, which is managed by bus arbiter 110. Cache controllers 108 “snoop” data bus 116 for requests for data that may be in their respective caches 106 or 104.
If no valid data is in any of the caches, then the data is retrieved from one of system memories 114, each being assigned a particular range of memory addresses, which are under the control of respective memory controllers 112. However, before a specific memory controller 112 accesses data from its respective system memory 114, the memory controller 112 waits until a combined response is returned to the data bus 116 by the bus arbiter 110 stating that none of the caches have the requested valid data.
Referring now to FIG. 2, a time line 200 illustrates the sequence of events in which a data request from a cache is performed. At time (1), the bus arbiter 110, in response to a query from one of the processors 102 (shown in FIG. 1), puts a data request on the data bus. At time (2), each cache controller 108 provides a “snoop” shared response, such as “retry,” “busy,” “valid data available,” etc. The bus arbiter “collects” the shared responses, and at time (3) issues an “early combined response,” which is a hint (guess) as to where the valid data is stored. That is, the bus arbiter 110 puts out an early response predicting which cache, if any, has the valid coherent data. At time (4), the bus arbiter 110 issues a “combined response,” which is a final response back to the bus confirming which cache controller 108, if any, has control and access to the requested data (or else that the request will be retried due to a bus collision or other delay).
As systems become more complex, as in more processors 102 (each with a dedicated cache controller 108) being connected to the data bus 116, the delay between the data request and the final combined response becomes much longer in a non-linear manner. That is, adding twice as many processors results in a time delay that is more than twice as long between the initial data request and the final combined response. This is due in part to the super-linear amount of time required for all cache controllers 108 to snoop and respond to the data request, and for the bus arbiter 116 to evaluate all of the cache controller responses and formulate the final combined response for broadcast back to the data bus 116.
In the event that none of the cache memories 106 or 108 have the requested valid data, then the data must be retrieved from one of the system memories 114. In an effort to minimize total time delay required to retrieve the data from a system memory 114 after a cache “miss,” memory controllers 112 also “snoop” data requests on the data bus 116, and speculatively pre-fetch data from their respective system memory 114 whenever the data request is for data at a memory address used by that system memory 114. That is, if a data request on data bus 116 is for data at an address used by system memory 114a, then memory controller 112a automatically speculatively pre-fetches the data at that address and stores the data in a queue in the memory controller 112a. This brute approach is highly inefficient, since many of the data requests are for data stored in cache memories, and thus an access to system memory is not needed. Automatically accessing the system memories 114 in this manner not only ties up valuable queue resources in the memory controller 112, but also consumes excessive power, which also results in the generation of excessive heat and wastes valuable power, including battery power.
Thus, there is a need for a system and method that allows memory controllers to more intelligently, and thus more accurately, predict and pre-fetch data from their associated system memories in response to a data request.