The invention relates to memory management and in particular, to performing generalized prefetching via memory block, or page, tags in a cache memory system.
In processing systems such as computers, the data to be utilized by a processor is stored in a memory (e.g., main memory, lower level memory) and control logic manages the transfer of data between the memory and the processor in response to requests issued by the processor. The data stored in the main memory generally includes both instructions to be executed by the processor and data to be operated on by the processor. For simplicity, both instructions and true data are referred to collectively herein as “data” unless the context requires otherwise. The time taken by a main memory access is relatively long in relation to the operating speeds of modern processors. To address this, a cache memory with a shorter access time is generally interposed between the main memory and the processor, and the control logic manages the storage of data retrieved from the main memory in the cache and the supply of data from the cache to the processor.
A typical cache is organized into multiple “lines”, each line providing storage for a line of data from the main memory which may be many bytes in length. When the processor issues a request for data contained in a particular line in a page, or block, of memory, the control logic determines whether that line is stored in the cache. If the line is stored in cache (i.e., there is a cache hit), the data is retrieved from the cache. If the line is not stored in cache (i.e., there is a cache miss), the data must be retrieved from the main memory and the processor is stalled while this operation takes place. Since a cache access is much faster than a lower level memory access, it is clearly desirable to manage the system so as to achieve a high ratio of cache hits to cache misses.
Memory latency is becoming an increasingly important factor in computer system performance. An implication of this increasing importance is that cache faults from the slowest on-chip cache are becoming more expensive in terms of performance. One approach to mitigating this problem is to increase the size of the cache. Increasing the size of the cache may improve performance, but cache memory is expensive in comparison to the slower, lower level memory. It is therefore important to use cache memory space as efficiently as possible.
One way to improve the efficiency of a cache memory system and to decrease memory latency time is to attempt to anticipate processor requests and retrieve lines of data from the memory in advance. This technique is known as prefetching. Prefetching can be performed by noting dynamic properties of the reference data stream such as sequential and/or strided accesses. Alternatively, prefetching can be performed on the basis of stored information. This stored information might be related to patterns of access within or between memory blocks or pages, or to hints produced by the compiler and/or programmer.
In order to assist in the performance of prefetching, an apparatus may store block-dependent information in main memory. This block-dependent information may be referred to as a block tag or tag. Block tags may be prepared and maintained by hardware and/or software for a variety of purposes including that of aiding a processor in its decisions to prefetch appropriate data from memory. A distinct feature of this scheme is that it enables long term learning of computer behavior, unlike say, schemes that employ a data structure that is stored inside a processor core which necessarily is much smaller in capacity.
Given a performance goal in mind, for example, that of reducing the miss rate in a cache through prefetching, an important issue is to determine the nature of the statistical information that is to be extracted and stored in a tag, along with a representation for it that is compact, yet useful. In the same vein, methods for managing, interpreting tags and generating appropriate system commands are of prime interest. Another important issue is how this information is used and managed when there are multiple processors in a system.
The idea that knowledge of past accesses for a block, or page, in memory may be useful for preparing good prefetch candidates is well known in the art. See for instance, the reference entitled “Adaptive Variation of the Transfer Unit in a Storage Hierarchy” by P. A. Franaszek and B. T. Bennett, IBM Journal of Research and Development, Vol. 22, No. 4, July 1978. In addition, U.S. Pat. No. 6,535,961 describes a mechanism that detects bursts of access to a memory block together with the memory reference that started the burst (the “nominating line”). During this burst, memory access activity for the memory block is stored in a spatial footprint that is associated with the nominating cache line. These spatial footprints are kept in an “active macro block table.” When a block becomes inactive, the corresponding spatial footprint is evicted and then stored in a “spatial footprint table.” The information in the spatial footprint table is then used to issue prefetch commands.
U.S. Pat. No. 6,678,795 discloses the use of a related idea to prepare prefetch candidates. An invention similar in spirit is described in U.S. Pat. No. 6,134,643 and in an article by Y. Haifeng and K. Gerson entitled “DRAM-Page Based Prediction and Prefetching”, 2000 IEEE International Conference on Computer Design: VLSI in computers and Processors Sep. 17-20, 2000 p. 267. The patent and article describe generating prefetches using the information stored in a “prediction table cache”, a data structure that maintains for each block, the most recent “N” line accesses to it (each block comprises N lines) using N log2 N bits per block entry. Further, an article by A. Thomas and K. Gershon entitled “Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems”, 2nd IEEE Symposium on High Performance Computer Architecture HPCA 96, Feb. 3-7, 1996, p. 254, teaches a system to store, for each memory block, the address of up to some number (e.g., four) of blocks that have been referenced in the vicinity of the original block, and to use this information to generate prefetches.
Issues with the prior art described in the previous paragraphs have to do with the quality and amount of information that needs to be stored. A simplistic method that utilizes N bits to describe the accesses to a page may become polluted with irrelevant information. Maintaining the identity of the M most recently referenced lines may require M to be so large that it is a burden on storage (e.g., in the system page tables).