1. Technical Field
The present invention relates in general to multilevel cache hierarchies in data processing systems and in particular to cache directory, controller, and snoop logic for multilevel cache hierarchies. Still more particularly, the present invention relates to vertical integration of cache directory, controller, and snoop logic for multilevel cache hierarchies in data processing systems.
2. Description of the Related Art
Most contemporary data processing system architectures include multiple levels of cache memory within the storage hierarchy. Caches are employed in data processing systems to provide faster access to frequently used data over access times associated with system memory, thereby improving overall performance. When utilized, multiple cache levels are typically employed in progressively larger sizes with a trade off to progressively longer access latencies. Smaller, faster caches are employed at levels within the storage hierarchy closer to the processor or processors, while larger, slower caches are employed at levels closer to system memory. Caches at any level may be private (reserved for a local processor) or shared (accessible to multiple processors), although typically caches at levels closer to the processors are private.
Level one (L1) caches, those logically closest to the processor, are typically implemented as an integral part of the processor and may be bifurcated into separate instruction and data caches. Lower level caches are generally implemented as separate devices, although a level two (L2) may be formed within the same silicon die as a processor. At all levels, however, caches generally include a number of common components: a cache directory, cache controller logic, logic implementing the cache replacement policy, and, in multiprocessor systems, snoop logic for detecting system bus operations which affect data within the cache. A block diagram of a typical cache configuration is depicted in FIG. 4. An L2 cache 402 includes directory (DIR) logic 404, a least-recently-used (LRU) replacement unit 406, a cache controller (C.C.) 408, snoop logic 410, and cache memory 412. Where a multilevel cache hierarchy is implemented with other caches logically in line with a cache such as L2 cache 402, generally the designs of specific cache components are reused for the other cache levels. For example, a level three (L3) cache 414 may be implemented by duplicating the design of L2 directory logic 404 for L3 directory logic 416, duplicating the design of L2 LRU unit 406 for L3 LRU unit 418, duplicating the design of L2 cache controller 408 for L3 cache controller 420, and duplicating design of L2 snoop logic 410 for L3 snoop logic 422. A larger cache memory 424 may be implemented for L3 cache 414.
The duplication of cache components requires a great deal of logic and a correspondingly large amount of silicon. Duplicate cache controllers 408 and 420 and duplicate snoop logic 410 and 422 in particular increases the amount of logic required since these components include a number of queues. For example, cache controllers 408 and 420 may include ten queues each, while snoop logic 410 and 422 may include four queues each. Most logic density within typical cache component design is based on the necessity for queues, providing a machine for each queue. Reducing the number of queues reduces the logic density, but also reduces performance.
Vertical cache implementations with duplicated logic also carry an associated latency. Data for a given instruction is first looked up in L2 cache 402; if missed in L2 cache 402, then the operation is presented to L3 cache 414 to determine if the required data is there. Absent additional levels in the cache hierarchy, a miss in L3 cache 414 results in a bus operation to access the data within system memory. Each effort to locate the required data within the storage hierarchy has an associated latency, which aggregate as each miss occurs.
A third problem with duplication of cache controller and snoop logic components for logically in line caches is that copying L2 designs for the L3 would require inclusivity unless the L3 is specially modified to avoid inclusivity. Most vertical cache configurations are inclusive, with the lower level, larger cache including the same data found in the higher level, smaller cache. This is less efficient than configurations in which a cache entry need not be found within both caches in the vertical hierarchy.
It would be desirable, therefore, to provide a vertical cache hierarchy configuration which reduces access latency to lower cache levels within the storage hierarchy and to system memory. It would further be advantageous if the multilevel cache hierarchy configuration reduces the logic density required for cache controllers and snoop logic for logically in line caches, and does not require inclusivity between vertical levels of a cache hierarchy.
It is therefore one object of the present invention to provide an improved multilevel cache hierarchy for data processing systems.
It is another object of the present invention to provide improved cache directory, controller, and snoop logic for multilevel cache hierarchies in data processing systems.
It is yet another object of the present invention to provide vertically integrated cache directory, controller, and snoop logic for multilevel cache hierarchies in data processing systems.
The foregoing objects are achieved as is now described. Logically in line caches within a multilevel cache hierarchy are jointly controlled by single cache controller. By combining the cache controller and snoop logic for different levels within the cache hierarchy, separate queues are not required for each level. Fewer total sets of queues results in consumption of less silicon area, higher frequency operation and improved performance. During a cache access, cache directories are looked up in parallel. Data is retrieved from an upper cache if hit, or from the lower cache if the upper cache misses and the lower cache hits. Because the directories are looked up in parallel, access latency to the lower level cache is reduced, as is access latency to system memory if all cache levels miss. LRU units may be updated in parallel based on cache directory hits, providing a more precise least-recently-used replacement policy for the total cache space. Alternatively, the lower cache LRU unit may be updated based on cache memory accesses rather than cache directory hits, or the cache hierarchy may be provided with user selectable modes of operation for both LRU unit update schemes. The merged vertical cache controller mechanism does not require the lower cache memory to be inclusive of the upper cache memory, thus improving cache efficiency since the same entry need not be stored in both. A novel deallocation scheme and update protocol may be implemented in conjunction with the merged vertical cache controller mechanism for further performance improvements.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.