1. Field of the Invention
The present invention relates generally to the field of computer system design and programming, and, more particularly, to learning and cache management in software defined contexts.
2. Description of the Related Art
Memory latency in modern computers is not decreasing at a rate commensurate with increasing processor speeds. This results in the computing device idly waiting for the system to fetch processes from the memory, thereby not fully taking advantage of the faster processor speeds. This problem is sometimes referred to as the memory wall problem.
However, the memory wall problem is not exclusive to memory accesses and may arise in a number of information retrieval scenarios, as contemplated by those skilled in the art. For example, when a line requested by a local processor has been modified by a remote processor, then such line needs to be communicated from the remote processor to the local processor. This communication process may necessitate a significant amount of time. Thus, we refer to the memory wall problem more generally as the “data access” wall problem.
An approach to mitigating data access latency is prefetching. The term “prefetching,” as used herein, traditionally refers to retrieving data expected to be used in the future. This results in increased complexity, as well as increased off-chip traffic. Prefetching increases traffic by fetching data that is not referenced before castout. As used herein, the term “castout” refers to the process by which data is removed from the cache to make room for data that is fetched.
Prefetching may be implemented, for example, using only hardware. With such hardware-only prefetch mechanisms, the details of the prefetching mechanism are largely hidden from an application programmer. However, hardware-only prefetch mechanisms are limited because prefetching is usually forced to be based solely on historical access information (e.g., a directory extension) and/or limited predictions based on the detection of simple patterns such as address strides. The directory extension is described Franaszek et al., “Victim Prefetching in a Cache Hierarchy,” Ser. No. 10/989,997, the disclosure of which is entirely incorporated herein.
A brief description of the techniques of stride detection and directory extension will now be provided. Software code frequently processes batches of data by scanning it upwards or downwards. The resulting data access patterns can be detected and subsequently predicted by simple circuitry which is often included in computing processors. This technique can be very successful in improving the performance of software codes with the property above, but cannot predict other patterns that may arise that do not have this simple regular structure. An example of a technique that is designed to deal with the latter is a directory extension, which is described next. A victim cache stores lines evicted from a primary cache, and is known to improve the performance of direct-mapped caches significantly. With a directory extension, the concept of the victim cache is further extended such that only the identities of the victims are stored, rather than the actual lines of the cache. The victims' information is stored page-wise in a cache called the “directory extension” because it identifies which lines are in, for example, level 3 (L3) of the cache but not in level 2 (L2) for a set of recently accessed pages. Misses in L2 immediately trigger prefetching from the L3 into the L2 in accordance with the information provided by the directory extension, if an entry for the given page exists.
Alternatively, prefetching may be implemented using software. Software-directed prefetching is based on the concept that the (data and/or instruction) access patterns observed by hardware-only prefetching mechanisms are produced by software. Thus, in principle, appropriate analysis at the compiler level (e.g., online in the setting of continuous optimization of code) or application programmer-provided hints may provide higher quality prefetch instructions than in the hardware-only case. However, a compiler may be successful only for a limited class of software codes. Further, it must be assumed that a programmer will have sufficient time and experience to provide good software hints, which is not always feasible.