Cache memories have been used to improve processor performance, while maintaining reasonable system costs. A cache memory is a very fast buffer comprising an array of local storage cells used by one or more processors to hold frequently requested copies of data. A typical cache memory system comprises a hierarchy of memory structures, which usually includes a local (L1), on-chip cache that represents the first level in the hierarchy. A secondary (L2) cache is often associated with the processor for providing an intermediate level of cache memory between the processor and main memory. Main memory, also commonly referred to as system or bulk memory, lies at the bottom (i.e., slowest, largest) level of the memory hierarchy.
In a conventional computer system, a processor is coupled to a system bus that provides access to main memory. An additional backside bus may be utilized to couple the processor to a L2 cache memory. Other system architectures may couple the L2 cache memory to the system bus via its own dedicated bus. Most often, L2 cache memory comprises a static random access memory (SRAM) that includes a data array, a cache directory, and cache management logic. The cache directory usually includes a tag array, tag status bits, and least recently used (LRU) bits. (Each directory entry is called a “tag”.) The tag RAM contains the main memory addresses of code and data stored in the data cache RAM plus additional status bits used by the cache management logic.
Recent advances in semiconductor processing technology have made possible the fabrication of large L2 cache memories on the same die as the processor core. As device and circuit features continue to shrink as the technology improves, researchers have begun proposing designs that integrate a very large (e.g., multiple megabytes) third level (L3) cache memory on the same die as the processor core for improved data processing performance. While such a high level of integration is desirable from the standpoint of achieving high-speed performance, there are still difficulties that must be overcome.
Large on-die cache memories are typically subdivided into multiple cache memory banks, which are then coupled to a wide (e.g., 32 bytes, 256 bits wide) data bus. In a very large cache memory comprising multiple banks, one problem that arises is the large resistive-capacitive (RC) signal delay associated with the long bus lines when driven at a high clock rate (e.g., 1 GHz). Further, various banks of the cache may be wired differently and employ different access technologies.
One type of cache is referred to as Uniform Cache Access (UCA), or Uniform Cache Architecture. UCA caches are multi-bank caches that enforce equal latency to all banks. UCA ensures that all banks are wired with traces of equal length, or have appropriate delay elements inserted along the traces. Although UCA ensures equal latency to all banks, it forces all banks to operate with the highest latency because the latency is determined by the latency to the furthest bank.
Another type of cache is referred to as Non-Uniform Cache Access (NUCA), or alternatively referred to as Non-Uniform Cache Architecture. In NUCA caches, the latency to a bank generally depends on the proximity to the device making the request, which frequently is a processor. NUCA allows banks closest to the processor to respond the fastest and forces the banks furthest from the processor to respond the slowest. NUCA caches are traditionally large in size and consume relatively large amounts of power. Current power savings techniques do not cater to NUCA architectures.