In computer architecture applications, processors often use caches and other memory local to the processor to access data during execution. The processors more efficiently execute instructions when, for example, data accessed by a processor is stored locally in a cache. Prefetchers are used to predictively access and store data in view of potential requests for data and/or program data stored in the memory. A prefetcher stores blocks of memory locally in a smaller, lower latency memory buffer using a replacement policy that governs which data are to be discarded when new data arrives. If the discarded data have been requested by the cache system but have not yet been sent to processor requesting the data, then new prefetches that are allocated to those locations are forced to stall (e.g., wait) until the data is returned to the cache to maintain cache coherency. The problem is compounded when multiple caches (often having differing line sizes and timing requirements) are used. Thus, an improvement in techniques for reducing stalls associated with generation of prefetch requests for a cache is desirable.
The problems noted above are solved in large part by a prefetch unit that minimizes latency of memory in a system having multiple layers of memory. The disclosed prefetch unit can service multiple memory requestors (such as a processor, and memory controllers of level-one (L1) and level-two (L2) caches, even when the caches have different line sizes. Accordingly, the disclosed prefetch unit can prefetch memory from a grater range of memory, which provides a higher level of performance (such as reduced latencies and reduced space and power requirements).
As disclosed herein, a prefetch unit generates prefetch addresses in response to an initial received memory read request, an address associated with the initial received memory read request, a line length of the requestor of the initial received memory read request, and a request type width of the initial received memory read request. Prefetch operations are generated using the generated prefetch addresses, wherein each generated prefetch address is stored in a prefetch buffer slot that is selected by a prefetch FIFO (First In First Out) prefetch counter. Subsequent hits on the prefetcher result in returning prefetched data to the requestor in response to a subsequent memory read request received after the initial received memory read request.