Data processors, like microprocessors used in personal computers, rely for the execution of the intended programs on working memories (“main memories”) typically constituted by banks of Dynamic RAMs (DRAMs).
Modern data processors, however, operate at speeds which exceed the usual DRAM access times. To overcome such problems, and allow the processor to run near its clock speed, cache memory systems including cache memories are used.
A cache memory (hereinafter also referred to as “cache”, for brevity) is a relatively small, but fast memory (typically, a Static RAM-SRAM), which stores (or “caches”) copies of the data content of the most frequently, or more recently accessed main memory locations. As long as most memory accesses are made to cached main memory locations, instead of to the main memory itself, the average latency of memory accesses is closer to the cache latency than to the latency of the main memory.
A typical cache memory system comprises two different memory components. A first memory component, referred to as “data cache”, which is usually the larger of the two, is a memory which stores the copies of the data or the instructions needed by the processor, so that such information need not be retrieved from the main memory. A second memory component, referred to as the “tag cache”, is used to store portions of main memory addresses. The data cache includes a plurality of storage locations called “cache lines”. Usually, the width (in terms of number of bits) of the generic cache line is larger than the width of the generic main memory location. Thus, when a generic main memory location is accessed, a corresponding group of main memory locations is actually cached into the cache memory. The term “cache lines” is also used to identify one of said groups of main memory locations. Each cache line of the data cache is associated with a corresponding location in the tag cache, which contains a “tag” The tag represents a portion of a main memory address, identifying a respective cache line in the main memory which has been cached into that cache line of the data cache.
When the processor has to read or write a main memory location, it issues a main memory address code, which identifies a cache line in the main memory. The cache memory system checks whether the addressed cache line is present in the cache. This is accomplished by comparing the part of the main memory address code identifying the addressed cache line (i.e., the tag portion of the address) to all the tags stored in the tag cache locations. If it is found that the addressed main memory cache line is in the cache, a cache “hit” is decreed. Otherwise, the cache memory system decrees a cache “miss”. In the case of a cache hit, the processor immediately reads/writes the data from/into the proper cache line of the data cache. Otherwise, the main memory is accessed. In the case of a cache miss, the operations take more time, because they require the data transfer from the main memory, which is much slower than the cache memory. The higher the number of cache hits (known as the “hit rate”), the more effective the cache.
In the case of a cache miss, most caches allocate a new entry, which comprises the tag just missed and a copy of the corresponding data from the main memory. If the cache is already full, one of the existing cache entries in the data cache (and the corresponding entry in the tag cache) needs to be removed. The modality that is used to choose the cache line to be replaced is called the “replacement policy”. As known in the art, one popular replacement policy (Least Recently Used, LRU) replaces the least recently used entry.
When the data stored in a cache line of the cache memory is changed in response to a write access requested by the processor, the data stored in the main memory needs to be updated. The moment in time when the updating operation is performed depends on the so-called “write policy” of the memory system. A known write policy, called “write-through”, provides that every write to the cache causes an immediate write to the main memory. Alternatively, according to the so-called “write-back” policy, writes to the cache are not immediately performed as well onto the memory. The cache keeps track of which cache lines have been modified, and the data in these cache lines is written back to the main memory when that cache line has to be used for caching a different main memory cache line. For this reason, a miss in a write-back cache will often require two memory accesses to service.
A very important factor in determining the effectiveness of a cache relates to how the cache is mapped onto the main memory. Three different approaches to allocate the data in the cache according to the corresponding main memory addresses are known in the art.
The simplest approach to mapping the cache onto the main memory, called “direct mapping”, calls for determining how many cache lines there are in the data cache memory, and dividing the main memory into the same number of cache lines. Therefore, when a generic one of said main memory cache lines is cached, it fills a predetermined one of said data cache lines.
Instead of establishing a rigid correspondence between cache lines of the data cache memory and main memory cache lines, it is possible to design the cache so that any cache line can store the data contents of any main memory cache line. This approach is called “fully associative mapping”.
A compromise between the direct mapping and the fully associative mapping is the so-called “N-way set associative mapping”. In this case, the cache is divided into sets, each set containing a number N of cache lines (each one corresponding to a “way”). Typically, N may be equal to 2, 4 or 8. The main memory address space is divided into corresponding sets, and the generic main memory cache line can be cached into any one of the N cache lines of the corresponding set (determined on the basis of the main memory address). In other words, within each cache line set the cache is associative.
Cache memory systems may be embedded in, e.g., microprocessor integrated circuits (so-called embedded processor ICs).
The power consumption of a cache memory system represents an important fraction of the overall power consumption of an embedded processor IC. Nowadays, cache memory systems are implemented providing within an IC chip different static RAM devices (SRAMs), corresponding to the tag cache and the data cache for each way. This solution, although improving hit rate and memory access times, needs the replication of all the circuits necessary for the functioning of the memory (input/output buffers, decoders, read circuits, write circuits and so on) per each SRAM, thus involving a higher waste of semiconductor area and higher power consumption. Moreover, the RAM specifically designed for high-performance cache memory systems (adapted to access data and tags in a same clock cycle) exhibit an elevated static power dissipation.
In view of the state of the art outlined in the foregoing, the Applicant has faced the general problem of how to implement a cache memory system in an efficient way, assuring low power consumptions.