1. Technical Field
The present invention relates in general to data storage, and in particular, to a cache memory having a non-uniform cache architecture (NUCA).
2. Description of the Related Art
A conventional multiprocessor computer system includes multiple processing units all coupled to a system interconnect. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and is generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy for temporarily storing instructions and data retrieved from the system memory.
In current large-scale computer systems, it is common to use “deep” cache hierarchies, with small and fast L1 caches implemented, for example, in Static Random Access Memory (SRAM) and with multiple larger and slower lower level caches implemented, for example, in Embedded Dynamic Random Access Memory (EDRAM). Conventional “deep” cache hierarchies are characterized by significant cache management overhead (e.g., to manage coherency across all levels of the hierarchy), high latency access to lower levels of the cache hierarchy, and storage inefficiency in that a single multi-level cache hierarchy may hold multiple copies of a same cache line.
In an attempt to improve upon conventional “deep” cache hierarchies, a number of Non-Uniform Cache Architectures (NUCAs) have been proposed. In general, a NUCA flattens the conventional multi-level cache hierarchy by using a fewer numbers of cache hierarchy levels with a large number of banks of the same memory technology (e.g., SRAM, EDRAM, etc.) in each level of the cache hierarchy. As a consequence of the physical structure of such cache architectures, entries in different banks of the same cache memory have non-uniform access times dependent on physical position, giving rise to the term NUCA.
Various new cache management policies have been proposed for NUCA caches, including static NUCA (S-NUCA) and dynamic NUCA (D-NUCA). In an S-NUCA cache, data are statically allocated to the cache banks (e.g., based upon index bits of memory addresses) and remain in the allocated banks until deallocated. In contrast, a D-NUCA cache permits data to reside in different banks and employs a migration mechanism to move data among the banks to reduce wire delay effects. For example, in a D-NUCA cache employing generational promotion, the storage locations or entries comprising each congruence class are ranked by access latency, and upon access a cache line in a congruence class is promoted to the next lower latency entry of that congruence class and demoted to a higher latency entry as other cache lines in the congruence class are accessed.