In computer architecture applications, processors often use caches and other memory local to the processor to access data during execution. The processors more efficiently execute instructions when, for example, data accessed by a processor is stored locally in a cache. Prefetchers are used to predictively access and store data in view of potential requests for data and/or program data stored in the memory. A prefetch unit (also known as a “prefetcher”) prefetches and stores blocks of memory locally in a smaller, lower latency memory buffer using a replacement policy. The replacement policy governs which cache lines of data are to be discarded when new data arrives. If the discarded cache lines have been requested by the cache system but have not yet been sent to processor requesting the data, then new prefetches that are allocated to those locations are forced to stall (e.g., wait) until the data is returned to the cache to maintain cache coherency. The problem is compounded when multiple caches (often having differing line sizes and timing requirements) are used. Thus, an improvement in techniques for reducing stalls associated with generation of prefetch requests for a cache is desirable.
The problems noted above are solved in large part by a prefetch unit that prefetches cache lines for higher-level memory caches where each cache has a line size or width that differs from the line width of another local cache. The disclosed prefetch unit uses a slot/sub-slot architecture to service multiple memory requestors, such as a level-one (L1) and level-two (L2) cache, even when the caches have mutually different line sizes. Each slot of the prefetch unit is arranged to include sub-slots, where each sub-slot (for example) includes data and status bits for an upper and a lower half-line, where both half-lines are associated with a single tag address. Accordingly, the disclosed prefetch unit can prefetch memory for caches having mutually different line sizes, which provides a higher level of performance (such as reduced latencies and reduced space and power requirements).
As disclosed herein, a prefetch unit generates a prefetch address in response to an address associated with a memory read request received from the first or second cache. The prefetch unit includes a prefetch buffer that is arranged to store the prefetch address in an address buffer of a selected slot of the prefetch buffer, where each slot of the prefetch unit includes a buffer for storing a prefetch address, and two sub-slots. Each sub-slot includes a data buffer for storing data that is prefetched using the prefetch address stored in the slot, and one of the two sub-slots of the slot is selected in response to a portion of the generated prefetch address. Subsequent hits on the prefetcher result in returning prefetched data to the requestor in response to a subsequent memory read request received after the initial received memory read request.