The present invention relates generally to prefetching in a computer processor, and more specifically, to pre-computation slice (p-slice) merging for prefetching in a computer processor.
During execution on a processor, an application may fetch data from a relatively large, slow main memory to a smaller, faster cache memory that is local to the processor in order to perform operations using the data. The time required to fetch the data (i.e., data access latency) may dominate the application execution time. Data prefetching uses a combination of hardware and/or software to hide this latency by predicting the data that an application will need and fetching the data ahead of time into the desired level of cache hierarchy. A prefetcher may track regular data access patterns (e.g., streaming, stride, or constant) that are observed during application execution, and prefetch future data references based on the prediction that a pattern will recur. However, a prefetcher may not be successful in tracking or prefetching for irregular data access patterns.
Speculative pre-computation slices, or p-slices, are used to perform prefetching for instructions having irregular data access patterns that may incur cache misses, also referred to as delinquent instructions. For a given delinquent instruction, a backward slice of instructions called a p-slice, made up of all instructions that directly or indirectly produce the source operands of the delinquent instruction, is extracted. By scheduling a p-slice as a concurrent software thread along with the main thread of the application, the data required by the main thread can be prefetched based on the execution of the p-slices. The pre-computation thread executing the p-slice must periodically check the main thread execution to guarantee that the p-slice is not out of sync with the main thread.