In computer architecture applications, processors often use caches and other memory local to the processor to store data during execution. The processors more efficiently execute instructions when, for example, data accessed by a processor is stored locally in a cache. This problem is compounded when the referenced data is not stored or retained in a cache or localized memory, such as often occurs when memory requests due to multiple streaming are encountered. CPUs (central processing units) often use data in a stream only once, but often access multiple parallel streams in parallel. As addressed in the instant disclosure, conventional cache data replacement policies “push streams out” (e.g., overwrite cached data for a stream) if the number of cache ways are not sufficient to retain all steams of data at the same time. Thus, an improvement in techniques for lowering latency requirements when referenced data is not stored or retained in a cache is desirable.
The problems noted above are solved in large part by a prefetching system that receives a memory read request having an associated address. As disclosed herein, a prefetch FIFO (First In-First Out) counter is modified to point to a next slot of the array in response to a determination that a most significant portion of the associated address is not present within slots of an array for storing the most significant portion of predicted addresses. A new predicted address is generated in response to the received most significant portion of the associated address and is placed in the next slot of the array. The prefetch FIFO counter cycles through the slots of the array before wrapping around to a first slot of the array for storing the most significant portion of predicted addresses.