1. Field of the Invention
The present invention relates to computer storage management, and, more particularly, to victim prefetching in a cache hierarchy.
2. Description of the Related Art
Memory latency in modern computers is not decreasing at a rate commensurate with increasing processor speeds. This results in the computing device idly waiting for the system to fetch processes from the memory, thereby not fully taking advantage of the faster processor speeds.
Approaches to mitigating memory latency include multithreading and prefetching. The term “multithreading” refers to the ability of a processor to execute two programs almost simultaneously. This permits the processing of one program or “thread” while the other is waiting for data to be fetched. The term “prefetching,” as used herein, traditionally refers to retrieving data expected to be used in the future. Each results in increased complexity, as well as increased off-chip traffic. In the case of multithreading, the increased traffic is due to decreased per thread cache capacity, yielding higher miss rates. Prefetching increases traffic by fetching data which is not referenced before castout. As used herein, the term “castout” refers to the process by which data is removed from the cache to make room for data that is fetched.
A number of approaches to prefetching are described in the prior art and/or are implemented in computer systems. These generally include the fetching of data based on currently observed access behavior such as strides (where accesses are to a series of addresses separated by a fixed increment), or via previously recorded access patterns, or by direct software instructions. Each of these approaches have some advantages. However, they either require increased program complexity (for software instructions), have problems dealing with nonregular access patterns (in the case of, for example, stride-based prefetching), or unnecessary complexity (in the case of, for example, recorded access patterns). Further, these earlier approaches do not utilize information readily available to the processor, namely the identity of recently discarded cache lines coupled with their status and location in the storage hierarchy.