1. Technical Field
The present invention relates generally to memories in computer processing systems and, in particular, to a method and apparatus for memory prefetching based on intra-page usage history.
2. Background Description
With respect to prefetching based on history, the prior art corresponding thereto uses history information to predict the next possible cache line that a processor may access. Some of the prior art is based on parameters like stride distance, i.e., the distance between two consecutive accesses; in this case, the prediction engine adds or subtracts the stride distance to the current address with the anticipation that the processor will request the resulting address. Other approaches predict the next possible address based on some trend seen earlier in other regions of memory. Yet others, like the Cosmos coherence message predictor, predict the source and type of coherence message for a cache line in a multiprocessor (MP) system using a complex prediction logic. The goal of the Cosmos predictor is to predict incoming messages that affect memory blocks and execute those messages in time. The Cosmos predictor is described by Hill et al., in xe2x80x9cUsing Prediction to Accelerate Coherence Protocols, Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), Barcelona, Spain, Jun. 27 through Jul. 2, 1998, pp. 179-90. Generally, most prior art approaches are proposed with time critical operations in mind, where a misprediction can be expensive in both clock cycles and bus saturation. However, most of these approaches suffer from a long learning time overhead which is necessary for improved prediction accuracy.
Accordingly, it would be desirable and highly advantageous to have a method and apparatus for memory prefetching which does not suffer from a long learning time overhead and which is capable of delivering multiple data and/or instruction objects well before such objects are actually used.
The problems stated above, as well as other related problems of the prior art, are solved by the present invention, a method and apparatus for memory prefetching based on intra-page usage history. The invention employs data-centric techniques to reduce data/instruction misses in cache memories in a computer processing system. The invention is based upon a mechanism for data/instruction prefetching which, in turn, is based upon prior intra-page usage. The concept is to monitor and keep a history of which cache lines (data/instructions) within a page are used, and attempt to pull multiple cache lines when the previous usage of a page appears to repeat. Cache lines may be pulled directly from main memory or from higher level caches, and even from other nodes/processors in the case of a multiprocessor (MP) system when the cache lines are resident there.
As noted above, the prior art in memory prefetching based on history has mostly been limited to the prediction and prefetching of the next possible sequential cache line(s) after a cache miss, or multiple cache lines that are identified by a sequential pattern. In contrast, the invention may prefetch arbitrary lines within a page which do not need to exhibit any pattern, only history that they have been used in the past.
Moreover, the invention may use only one previous instance of a page being accessed (rather than two or more, as required by stride-based approaches of the prior art). Thus, the invention has a minimal initial learning overhead with respect to the prior art. Also, the invention employs filtering of prefetch requests, which aids in removing redundant and useless prefetches so as to avoid excessive prefetch bus traffic.
According to a first aspect of the invention, there is provided a method for fetching at least one of instructions and operand data from a second memory into a first memory of a computer system having at least one processor. The method includes the step of storing a plurality of entries in a table associated with the first memory, wherein each entry is associated with a memory page that includes a plurality of storage elements in the second memory, and includes information of prior access by the at least one processor to each of the plurality of storage elements. Upon a miss to the first memory from the at least one processor based upon a request, the table is searched for a given entry associated with a given page that includes a target of the request. If the given entry is found, then at least one prefetch request is generated to fetch at least one storage element included in the given page from the second memory to the first memory, based upon given information comprised in the given entry. Prior to satisfying the at least one prefetch request, the at least one prefetch request is analyzed with respect to other prefetch requests and fetch requests, if any, to determine which of the at least one prefetch request and the other prefetch requests are to be satisfied.
According to a second aspect of the invention, the method further includes the step of satisfying at least some of the at least one prefetch request and the other prefetch requests, based on a result of the analyzing step.
According to a third aspect of the invention, upon a hit to the first memory from the at least one processor based upon the request, the table is searched for a particular entry corresponding to a particular page that includes the target of the request. If the particular entry is found, particular information included in the particular entry is updated with usage information relative to any particular storage elements that contain the target.
According to a fourth aspect of the invention, the method further includes the step of creating a new entry for the given page, if the given entry is not found.
According to a fifth aspect of the invention, the creating step includes the step of flagging any given storage elements in the given page that contain the target.
According to a sixth aspect of the invention, the flagging step includes the step of setting a bit corresponding to each of the given storage elements to a predefined value.
According to a seventh aspect of the invention, the analyzing step includes the step of identifying any redundant prefetch and fetch requests from among the at least one prefetch request, the other prefetch requests, and the fetch requests.
According to an eighth aspect of the invention, the analyzing step includes the step of prioritizing an order of the at least one prefetch request, the other prefetch requests, and the fetch requests with respect to other operations being handled by the second memory.
According to a ninth aspect of the invention, the generating step further includes the step of limiting how many prefetch requests are at least one of generated and issued, based upon predefined criteria.
According to a tenth aspect of the invention, the limiting step includes the step of constraining the generating step to generate the at least one prefetch request only if a target of the at least one prefetch request has been previously stored in the second memory.
According to an eleventh aspect of the invention, the limiting step includes the step of issuing the at least one prefetch request only if a target of the at least one prefetch request has been previously stored in the second memory.
According to a twelfth aspect of the invention, the method further includes the step of indicating whether a particular storage element has been previously stored in the second memory, based on a value associated with a particular bit in the table corresponding to the particular storage element.
According to a thirteenth aspect of the invention, the limiting step includes the step of maintaining a count of storage elements that are mapped to the given entry in the table and are currently present in the second memory.
According to a fourteenth aspect of the invention, the method further includes the step of incrementing the count for the given entry, upon the miss.
According to a fifteenth aspect of the invention, the method further includes the step of decrementing the count for the given entry, when a given storage element included in the given entry is evicted from the given entry.
According to a sixteenth aspect of the invention, the method further includes the step of determining a confidence value for the at least one prefetch request.
According to a seventeenth aspect of the invention, the determining step includes the step of reporting whether the at least one prefetch request was at least one of accessed, used and canceled.
According to an eighteenth aspect of the invention, the limiting step includes the step of constraining the generating step to generate the at least one prefetch request only if a target of the at least one prefetch request has a prefetch use measure greater than a predefined threshold.
These and other aspects, features and advantages of the present invention will become apparent from the following detailed description of preferred embodiments, which is to be read in connection with the accompanying drawings.