Processors operate at much higher speeds than the memory devices that store the data executed by the processor. Many systems implement hierarchical caching or a hierarchical memory subsystem. In a hierarchical system, smaller, faster caches are connected to the processor, and the processor accesses data from them. The smaller, faster caches in turn access data from larger, slower caches. There may be several levels of caching. It will be understood that the cache devices could also be referred to as memory devices.
Prefetching data from slower memory into faster caches prior to being requested by an operation executed by the processor is a common technique to minimize request response latency. However, due to the fact that the data is requested by the cache prior to being requested by the processor, there is a risk that the data might be accessed by the cache, only to be evicted from the cache without being used by the processor. There are two aspects to prefetching that can controlled to manage the risk associated with prefetching the “wrong” data: prefetch accuracy, which indicates what lines of data to fetch; and, timeliness, which indicates when to access the lines of data.
Low prefetch accuracy results in wasted memory bandwidth due to fetching unwanted data from memory. Low prefetch accuracy and/or untimely prefetching can result in cache pollution when wrongly prefetched data evicts already present useful data. Traditional prefetch mechanisms use a “pull” model, where the requesting cache pulls the data from the higher level(s) of the memory hierarchy. For example, the last level cache (LLC) can send a prefetch request to the memory controller, causing the memory controller to send the data back. With a pull model, the requestor is unaware of the load on the higher level memory (the bandwidth provider), and so lacks information to determine whether or not it should make a prefetch request. Additionally, mechanisms to throttle the requestor do exist, but typically require complex messaging between the requestor and the higher level memory, which consumes valuable transfer bandwidth.
Descriptions of certain details and implementations follow, including a description of the figures, which may depict some or all of the embodiments described below, as well as discussing other potential embodiments or implementations of the inventive concepts presented herein.