Data prefetching, or early fetching of data into a cache, is a feature implemented in a processor to augment a probability of having requested data in a timely manner and thereby maintain a high processing efficiency. When the data is available at a first cache level, a number of cycles where the processor stalls may be reduced. For example, a processor may stall when waiting for data to come back from more distant (with respect to the processor) cache levels or memory.
Currently, many data prefetchers in modern state-of-the art processors work within page boundaries. Every prefetch request that crosses a page boundary is dropped by the data prefetcher. This is because every time the processor crosses the page boundary, it should guarantee it can obtain a translation from virtual to physical addresses. The translation lookaside buffer (TLB) may not always have the address translation and the data prefetcher cannot access the TLB to obtain the address translation. As a result, a data prefetcher may be very aggressive in making data requests in advance of next addresses, but if it crosses a page boundary, it cannot generate that request because it does not have the physical translation. As the accuracy of data prefetchers increases, this inability to move beyond page boundaries can result in processor latency and performance setbacks.