1. Field of the Invention
The present invention is related to cache memories and more particularly to storing and accessing data in a cache memory for reduced energy consumption.
2. Background Description
Random access memories (RAMs) are well known in the art. A typical RAM has a memory array wherein every location is addressable and freely accessible by providing the correct corresponding address. Dynamic RAMs (DRAMs) are dense RAMs with a very small memory cell. High performance Static RAMs (SRAMs) are somewhat less dense (and generally more expensive per bit) than DRAMs, but expend more power in each access to achieve speed, i.e., provide better access times than DRAMs at the cost of higher power. Content addressable memories (CAMs), which also are well known in the art, relate memory locations to detectable values (i.e., location content) and have two modes of operation. In a storage mode of operation the CAM accepts data for particular locations (e.g., reading/writing to CAM locations), similar to loading a RAM or loading data in a register file. In a second content addressable or search mode, CAM storage locations are identified by and selected by what the locations contain. A particular identifying value, typically called a Comparand is provided, and the array is searched for a match by comparing array contents to the Comparand.
In a typical data processing system, the bulk of the memory is DRAM in main memory with faster SRAM in cache memory, closer to the processor or microprocessor. Caching is known as an effective technique for increasing microprocessor performance. Typical cache memories are organized with data stored in blocks and, data and tag information in a cache line for each cached data block. Each data block is identified by one of n tags, where each tag may be a virtual index into the cache. The tag, normally, includes the upper bits of a virtual address in combination with an address space identifier that is unique to a particular process. Locating a block in cache requires searching cache line data for the virtual address, i.e., the tag, which may be located in one and only one cache location. So, unfortunately, caching is also a major contributor to microprocessor system energy consumption.
Consequently, because finding a virtual address in RAM requires checking cache lines sequentially, until the virtual address is located; CAMs work well for cache memory applications, especially for finding a particular tag associated with a selected virtual memory address. In particular, an n-way associative cache memory does n tag and data checks in CAM in parallel and, provided the selected block is in cache, quickly locates the tag for the selected block and ignores the rest.
Accordingly as illustrated in FIG. 1, in what is known as a CAMRAM cache 50, tags 52 are stored in CAM 54 and associated data 56 is stored in a bank store (BS) 58, typically SRAM. In this example the CAMRAM 50 is an m (4 in this example) bank 60 cache. Each bank 60 is identified by a bank tag 62. If the incoming tag 52 matches one of the n entries in the CAM 54, that match 64 selects a corresponding data block in BS 58, which is made available for access 66, e.g., as output or for a cached store. Otherwise, a miss 68 is returned and the incoming request is directed to data located elsewhere, e.g., in main memory.
Standard cache memories store data and tag information in the RAM of a cache line. The hardware finds the data based on the virtual address, reads the data and checks the tag against the value stored in the line. The tag for a virtually indexed cache includes the upper bits of the virtual address and an address space identifier, which is unique to a process. An n-way associative cache memory does n tag and data checks in parallel, throwing out the value of all but one of them. While associativity is good and lowers cache miss rates while improving microprocessor performance, the redundant work it requires has a high energy cost. Direct-mapped caches, with associativity of 1, only read one tag and one data word/block and have lower hit energy. However, they have much larger miss rates due to conflicts and since the energy cost per miss is higher, they tend to have larger total memory access energy. Techniques like way-predicting caches can provide associativity at lower hit energy by only checking one way in an n-way set associative cache, but tend to incur energy and delay penalties to access the way-prediction table on way hits and additional energy and performance penalties if predictions are incorrect. Caches are also often split into subbanks, which handle certain address ranges. Bank addresses are direct mapped using the appropriate virtual address bits.
CAMRAM caching facilitates higher associativity and can reduce power consumption because of its sequential tag and data access. During a CAMRAM access, the search tag of the incoming address is broadcast to the tags depository i.e., the CAM. A matching tag (if any) locates the blocks in cache RAM that is requested for access, i.e., requested for a read operation or cached for storage in a store operation. M. Zhang and K Asanovich, “Highly-Associative Caches for Low-Power Processors,” Kool Chips Workshop, 33rd Int'l Symposium on Microarchitecture, (2000) describes how a 32-way CAM-tag search uses abut the same power as a 2-way set associative RAM-tag search. For additional power reduction, CAM-tag caches are often subbanked with a multi phased access. Typically, the CAM-tag compare is the first access phase, where each CAM cell compares its stored value in place with an arriving address. If there is a match in the first phase, the actual data read or write to cache occurs in the next phase.
Unfortunately, CAM-tag caches still use a significant amount of power finding the associated data in the first phase because the arriving address is broadcast to all of the CAM bank locations. Typically, more than half of CAMRAM cache power is consumed in the CAM-tag checking phase. Consequently, CAMRAM power is directly related to the number of bank entries, i.e., the larger the bank, the more power required. For an energy-efficient cache design, therefore, the designer must find the proper mix of associativity, size, structure configuration, and partitioning to achieve an acceptable energy consumption level. Achieving such a mix without proper regards to the inherent code and data behavior of targeted workloads has been difficult.
Thus, there is a need to reduce the number of tag checks per access and further, to reduce cache memory power consumption.