1. Field of the Invention
This invention relates to prefetching data, and particularly to methods, systems and computer program products for concomitant pair prefetching.
2. Description of Background
Prefetching is an effective way of hiding memory latency by anticipating future cache misses and bringing them earlier than being requested by the pipeline. Each prefetch algorithm tries to aim a specific reference stream pattern. The most popular reference stream pattern for prefetching is stride pattern, which often shows up in scientific applications. However, non-scientific applications often lack of stride pattern among reference stream and give little chance to improve performance toward such prefetching logics. Especially, recent commercial applications show little of stride patterns in memory reference streams and so give little hope of improving performance when stride-based prefetching logic is used alone.
Meanwhile, correlation-based prefetching algorithms show higher chance of detecting concomitant patterns from the recent commercial applications. Previous academic papers and patents regarding correlation-based prefetching have been focused on how to build pairs of leaders and followers and how to increase the coverage of prefetching. Currently, the tremendous increase of inaccurate prefetching requests is the biggest enemy of these correlation based prefetching approaches. Current solutions include adding some confirmation logic as a feedback mechanism in the effort of reducing inaccurate prefetching request and so wasting bus bandwidth.
However, the key of success in implementing a correlation-based prefetching logic for commercial processors is how to avoid employing gigantic tables to hold all possible pairs and how to minimize inaccurate prefetch requests. Depending on the location of the prefetch logic, timing for state updates and prefetch address look up can also be critical to the success of implementing a correlation-based prefetch logic.
In scientific computations involving sparse matrices, it is common to compute the indices of desired elements of an array and store them into another array and access them indirectly as illustrated by the following pseudo-code: access A[B[i]], i=0,n. Let @X and ΔX can respectively denote the starting address and element size of any vector X and M[y] can denote the contents of memory location at address y. Then the address sequence generated by the above program segment is given by {@B+i*ΔB, @A+M[@B+i*ΔB]*ΔA}, i=0,n. It would be desirable to generate intelligent prefetching of these accesses. Current techniques detect constant strided-accesses of the type {@B+i*ΔB}, i=0,n, where .B is a constant, or chained accesses of the type {@Ri, @Ri+Δ}, where @R(i+1)=M[@Ri+Δ], i=0,n. Here, a record is accessed at @Ri, then a link field in it at offset Δ is accessed and then the next record, which is pointed by the contents of the link M[@Ri+Δ], is accessed. A third category of concomitant accesses, {@A, @B}, where the pair of addresses are unrelated, but co-occur, so that whenever the first is accessed, the second can be pre-fetched. Currently there is no way to fetch the indirect array accesses.