In computer architecture applications, processors often use caches and other memory local to the processor to access data during execution. The processors more efficiently execute instructions when, for example, data accessed by a processor is stored locally in a cache. Prefetchers are used to predictively access and store data in view of potential requests for data and/or program data stored in the memory. A prefetcher stores blocks of memory locally in a smaller, lower latency memory buffer using a replacement policy that governs which data are to be discarded when new data arrives. If the discarded data have been requested by the cache system but have not yet been sent to processor requesting the data, then new prefetches that are allocated to those locations are forced to stall (e.g., wait) until the data is returned to the cache to maintain cache coherency. Thus, an improvement in techniques for reducing stalls associated with generation of prefetch requests for a cache is desirable.
The problems noted above are solved in large part by a prefetch unit that decouples data return from prefetch generation to eliminate a race condition that occurs between old data waiting to be sent to the cache and data from the new prefetches returning (from main memory, for example). If the data being returned from a new prefetch for the slot that contains the waiting-to-be-sent data could potentially overwrite the waiting-to-be-sent data and thus compromise the cache's coherency.
As disclosed herein, a prefetch unit includes a program prefetch address generator that receives memory read requests and in response to addresses associated with the memory read request generates prefetch addresses and stores the prefetch addresses in slots of the prefetch unit buffer. Each slot includes a buffer for storing a prefetch address, two data buffers (in a parallel double-buffered configuration) for storing data that is prefetched using the prefetch address of the slot, and a data buffer selector for alternating the functionality of the two data buffers. A first buffer is used to hold data that is returned in response to a received memory request, and a second buffer is used to hold data from a subsequent prefetch operation having a subsequent prefetch address, such that the data in the first buffer is not overwritten even when the data in the first buffer is still in the process of being read out.
Accordingly, the prefetch unit uses a double data buffer to decouple the storage of data-to-be-sent (from the prefetch unit to the processor or higher-level cache, for example) from the storage of data (that is received as a result of prefetch generation). Thus, two locations exist for each slot in the prefetch unit and a status indicator is used to indicate in which location particular data resides. A first-selected buffer is used to store prefetches are being generated for a slot. The prefetched data is subsequently sent (to the processor or higher-level cache). To minimize processor stalls (for example), a prefetch is newly generated that also allocates data for the slot. A second-selected buffer is used to store the new data retrieved as a result of the newly generated prefetch. A metadata indicator bit is toggled (set and reset) to designate which buffer of the double buffer is the first-selected buffer and which buffer of the double buffer is the second-selected buffer. A read status register stores an additional address bit to ensure the correct buffer is read-out in response to a memory request for data stored by the prefetch unit.