1. Technical Field
This invention relates to systems and methods for caching data, and in particular to systems and methods for managing temporal and non-temporal data in a single cache.
2. Background Art
Processor speeds have been increasing faster than those of the memory systems that supply them with data. One widely-used strategy for addressing the discrepancy between processor and memory speeds is to organize the memory system as a hierarchy of storage structures. This hierarchy typically includes a main memory and one or more caches. A small cache, e.g. an L0 cache, is located close to the processor""s core pipeline to provide the processor with fast access to selected data from a working data set. Larger caches (L1, L2, . . .) accommodate increasingly larger portions of the working data set but require longer times to access the data. Data requests are satisfied from the lowest level of the memory hierarchy that holds the requested information.
Caches are kept small to increase their access speed, and the sizes of xe2x80x9con-chipxe2x80x9d caches, e.g. L0 and L1, are further limited to reduce their impact on the processor die area. As a result, storage space in caches is at a premium, and the data stored in a cache must be intelligently selected to fully realize the benefits of caching. Strategies typically focus on increasing the probability that frequently requested data is available from one of the low level caches.
Caches exploit the notions of spatial and temporal locality to store data that is likely to be requested by a program closer to the processor""s pipeline. Spatial locality refers to the tendency of a program to access data within a given region of memory, before moving on to a different region of memory. If data at a given memory address is accessed, the data at an adjacent memory address is also likely to be accessed. Caches exploit spatial locality by storing blocks of data from adjacent memory locations in the same cache entry or cache line.
Temporal locality refers to the probability that a piece of data accessed by a program will be accessed again soon. The time scale against which xe2x80x9csoonxe2x80x9d is determined is the time the data spends in a cache (xe2x80x9clifetimexe2x80x9d) before it is evicted to make room for new data. Data that is likely to be accessed repeatedly during its lifetime is characterized as temporal data, while data that is likely to be accessed infrequently during its lifetime is characterized as non-temporal data.
The temporal and spatial character of data may be illustrated by a matrix multiplication, Axc2x7B=C, where A, B, and C are given by:                                           [                                                                                a                    11                                                                                        a                    12                                                                    ⋯                                                                      a                                          1                      ,                      n                                                                                                                                        a                    21                                                                                        a                    22                                                                    ⋯                                                                      a                                          2                      ⁢                      n                                                                                                                    ⋮                                                  ⋮                                                  ⋯                                                  ⋮                                                                                                  a                    k1                                                                                        a                    k2                                                                    ⋯                                                                      a                    kn                                                                        ]                    ⁢                      xe2x80x83                    [                                                                      b                  1                                                                                                      b                  2                                                                                    ⋮                                                                                      b                  n                                                              ]                =                  [                                                                      c                  1                                                                                                      c                  2                                                                                    ⋮                                                                                      c                  n                                                              ]                                    Eq        .                  xe2x80x83                ⁢                  (          I          )                    
Here, aij is the element in the jth column of the ith row of matrix A, bj is the jth element in column vector B, and ci is the ith element in resulting column vector C. Each element of C is related to the elements of A and B as follows:                                                         ∑                              j                =                1                            n                        ⁢                          xe2x80x83                        ⁢                          (                                                a                  ij                                ⁢                                  b                  j                                            )                                =                      c            i                          ,                            Eq        .                  xe2x80x83                ⁢                  (          II          )                    
where n is the number of columns in A.
To evaluate an element ci, the processor accesses sequentially the elements of row i of A (ail . . . ain) and multiplies each by a corresponding element of B (bl . . . bn). The row elements of A are thus spatially local, and where space permits, they are stored in the same cache line. However, the row elements of A are not temporally local, since they are only accessed once to evaluate Eq. (I).
The elements of B are also spatially local since they are accessed sequentially as the processor steps through the corresponding row elements of A, per Eq. (II). Consequently, they are stored in the same cache line, space permitting. The temporal character of B""s elements depends in part on the cache size. Where the cache is large enough to accommodate the elements of A and B, the elements of B are temporally local as well, since they are accessed for each row of matrix A. For larger matrices and vectors in smaller caches, some of the vector elements may have to be evicted to accommodate the matrix elements. However, this can often be avoided through various blocking techniques.
It is clearly desirable to allocate cache space to temporal data, e.g. vector B, since it is accessed repeatedly. On the other hand, a certain amount of cache space should be allocated to non-temporal data, e.g. matrix A, to avoid long latency memory accesses when the data is required. Such data may be cached in anticipation of its use through prefetching and advanced load operations. However, where these operations evict temporal data to accommodate non-temporal data, they can reduce the performance of the cache.
Employing separate storage structures for temporal and non-temporal data prevents non-temporal data from displacing temporal data in a cache. However, use of a separate storage structure has several significant drawbacks. The storage structure requires additional die area, which is a limited resource in modem processors, and limiting the size of the cache structure reduces the lifetime of the data it stores. This reduction is particularly significant for non-temporal data, which is accessed infrequently as it is. In addition, coherence must be maintained between the second storage structure and the other storage structures of the memory system. The necessary circuitry consumes additional die area, complicates the design of the cache system, and is likely to degrade the cache access time.
The present invention is a system and method for managing temporal and non-temporal data in the same cache structure. Non-temporal data is allocated space in the cache structure using a replacement scheme that is biased to favor retention of temporal data.
In accordance with the present invention, the temporal/non-temporal character of data targeted by a cache access is determined, and a cache entry for the data is identified. A replacement priority indicator associated with the identified cache entry is updated according to the temporal/non-temporal character of the data.
For one embodiment of the invention, the replacement priority indicator is updated to reflect the cache access when the targeted data is temporal, and it is preserved in its current state when the targeted data is non-temporal. For example, the replacement priority indictor may identify a least-recently-used (LRU) cache entry from among candidate cache entries. On a cache miss that specifies non-temporal data, the LRU designation is preserved. On a cache miss that specifies temporal data, the LRU designation is modified to a most recently used (MRU) designation.