In computer architecture applications, processors often use caches and other memory local to the processor to store data during execution. The processors more efficiently execute instructions when, for example, data accessed by a processor is stored locally in a cache. This problem is compounded when the referenced data is not stored or retained in a cache or localized memory, such as often occurs when memory requests due to multiple streaming are encountered. CPUs (central processing units) often use data in a stream only once, but often access multiple parallel streams in parallel. As addressed in the instant disclosure, conventional cache data replacement policies “push streams out” (e.g., overwrite cached data for a stream) if the number of cache ways are not sufficient to retain all steams of data at the same time. Thus, an improvement in techniques for lowering latency requirements when referenced data is not stored or retained in a cache is desirable.
The problems noted above are solved in large part by a prefetching filter that receives a memory read request having an associated address. As disclosed herein, a prefetch filter receives a memory read request having an associated address for accessing data that is stored in a line of memory. An address window is determined that has an address range that encompasses an address space that is twice as large as the line of memory. In response to a determination of in which half the address window includes the requested line of memory, a prefetch direction is to a first direction or to an opposite direction.
The prefetch filter can include an array of slots for storing a portion of a next predicted access and determine a memory stream in response to a hit on the array by a subsequent memory request. The prefetch filter FIFO counter cycles through the slots of the array before wrapping around to a first slot of the array for storing a next predicted address portion. An address associated with the determined memory stream (and a direction of the determined memory stream) are passed to a data prefetch buffer. Filtering random memory access and providing indications of two sequential accesses (and the direction thereof) improves the utilization of the prefetches made by the data prefetch buffer.