1. Field of the Invention
This invention relates to the field of microprocessors and, more particularly, to cache memory subsystems within a microprocessor.
2. Description of the Related Art
Typical computer systems may contain one or more microprocessors which may be connected to one or more system memories. The processors may execute code and operate on data that is stored within the system memories. It is noted that as used herein, the term “processor” is synonymous with the term microprocessor. To facilitate the fetching and storing of instructions and data, a processor typically employs some type of memory system. In addition, to expedite accesses to the system memory, one or more cache memories may be included in the memory system. For example, some microprocessors may be implemented with one or more levels of cache memory. In a typical microprocessor, a level one (L1) cache and a level two (L2) cache may be used, while some newer processors may also use a level three (L3) cache. In many legacy processors, the L1 cache may reside on-chip and the L2 cache may reside off-chip. However, to further improve memory access times, many newer processors may use an on-chip L2 cache.
Generally speaking, the L2 cache may be larger and slower than the L1 cache. In addition, the L2 cache is often implemented as a unified cache, while the L1 cache may be implemented as a separate instruction cache and a data cache. The L1 data cache is used to hold the data most recently read or written by the software running on the microprocessor. The L1 instruction cache is similar to L1 data cache except that it holds the instructions executed most recently. It is noted that for convenience the L1 instruction cache and the L1 data cache may be referred to simply as the L1 cache, as appropriate. The L2 cache may be used to hold instructions and data that do not fit in the L1 cache. The L2 cache may be exclusive (e.g., it stores information that is not in the L1 cache) or it may be inclusive (e.g., it stores a copy of the information that is in the L1 cache).
Memory systems typically use some type of cache coherence mechanism to ensure that accurate data is supplied to a requester. The cache coherence mechanism typically uses the size of the data transferred in a single request as the unit of coherence. The unit of coherence is commonly referred to as a cache line. In some processors, for example, a given cache line may be 64 bytes, while some other processors employ a cache line of 32 bytes. In yet other processors, other numbers of bytes may be included in a single cache line. If a request misses in the L1 and L2 caches, an entire cache line of multiple words is transferred from main memory to the L2 and L1 caches, even though only one word may have been requested. Similarly, if a request for a word misses in the L1 cache but hits in the L2 cache, the entire L2 cache line including the requested word is transferred from the L2 cache to the L1 cache. Thus, a request for unit of data less than a respective cache line may cause an entire cache line to be transferred between the L2 cache and the L1 cache. Such transfers typically require multiple cycles to complete.
During a read or write to cacheable memory, the L1 cache is first checked to see if the requested information (e.g., instruction or data) is available. If the information is available, a hit occurs. If the information is not available, a miss occurs. If a miss occurs, then the L2 cache may be checked. Thus, when a miss occurs in the L1 cache but hits within, L2 cache, the information may be transferred from the L2 cache to the L1 cache. As described below, the amount of information transferred between the L2 and the L1 caches is typically a cache line. In addition, depending on the space available in L1 cache, a cache line may be evicted from the L1 cache to make room for the new cache line and may be subsequently stored in L2 cache. In some conventional processors, during this cache line “swap,” no other accesses to either L1 cache or L2 cache may be processed.
As described above, the L2 cache is generally much larger than the L1 cache. In many microprocessors, the L2 cache may account for as much as 70 percent or more of the microprocessor die size. The trend for future microprocessors is to have an even larger L2 cache. In many processors, the L2 cache may be implemented as a fully pipelined cache. As such, the L2 cache may use a clocking architecture that includes a clock distribution network that may span the entire cache array area. This type of clocking scheme may consume a great deal of power and a corresponding amount of heat, which may limit the scalability of future microprocessors. The power consumption may be attributed to clocked structures within each memory cell. Even simple memory cells having a single transistor may consume power due to the charging and discharging of the input gate capacitance. In addition, as component (e.g., transistors) sizes decrease, the resistor-capacitor (RC) time constant delays at interconnection points may be greater than the transistor delays. Thus, the clock tree distribution network of an L2 cache may be increasingly difficult to route which may also effect the scalability of future microprocessors.