The present invention relates to the field of computer systems, and in particular, to an energy optimized cache memory architecture exploiting spatial locality.
Improvements in technology scaling continue to bring new power and energy challenges in computer systems as the amount of power consumed per transistor does not scale down as quickly as the total density of transistors. In such systems, a significant amount of energy is consumed by the memory hierarchy which has long focused on improving memory latency and bandwidth by minimizing the gap between processor speeds and memory speeds.
Caches memories, or caches, play a critical role in reducing system energy. A typical cache memory is a fast access memory that stores data reflecting selected locations in a corresponding main memory of the computer system. Caches are usually comprised of Static Random Access Memory (“SRAM”) cells. Typically, the data stored in caches is organized into data sets which are commonly referred to as cache lines or cache blocks. Caches usually include storage areas for a set of tags that correspond to each block. Such tags typically include address tags that identify an area of the main memory that maps to the corresponding block. In addition, such cache tags usually provide status information for the corresponding block.
Although caches consume significant power, they can also save system power by filtering, and thereby reducing, costly off-chip accesses to main memory. Consequently, effectively utilizing caches is not only important for system performance, but also for system energy.
Cache compression is a known technique for increasing the effective cache capacity by compressing and compacting data, which reduces cache misses. Cache compression can also improve cache power by reading and writing less data for each cache access. Cache compression techniques may include targeting limited data patterns, such as dynamic zero compression and significance compression, to alternatives targeting more complex patterns. The “C-PACK” (Cache Packer) algorithm, for example, as described in “C-pack: a high-performance microprocessor cache compression algorithm,” IEEE Transactions on VLSI Systems, 2010 by X. Chen, L. Yang, R. Dick, L. Shang and H. Lekatsas, the contents of which is hereby expressly incorporated by reference, applies a pattern-based partial dictionary match compression technique with fixed packing, and uses a pair matching technique to locate cache blocks with sufficient unused space for newly allocate blocks, thereby offering a compression technique with lower hardware overhead. In general, cache compression can improve system energy if its energy overheads due to compressing and packing cache blocks are lower than the energy it saves by reducing accesses to the next level of memory in the memory hierarchy, such as to main memory.
However, existing cache compression techniques limit the effectiveness in optimizing system energy by lowering compressibility and incurring high energy overheads. Conventional compressed caches typically have three main drawbacks. First, to fit more cache blocks, conventional compressed caches typically double the tag array size, and as such, can only typically double the effective cache capacity. Second, packing more cache blocks often results in higher energy overheads. Variable packing techniques, which compress cache blocks into variable, sizes, improve compressibility, but incur higher energy overheads. These techniques need to frequently compact invalid cache blocks to make contiguous free space, called compaction or repacking, and as such, they significantly increase the number of accessed cache blocks. Thus, they remove the potential energy benefits of the compression. Third, conventional compressed caches limit the compression ratio. Several proposals, including those targeting energy-efficiency, use fixed-packing techniques that at most fit two compressed cache blocks in the space of one uncompressed block. In addition, all of the existing cache compression proposals compress small blocks, for example, 64 Bytes, not allowing higher compression ratios made possible by compressing larger blocks of data.