1. Field of the Invention
This invention generally relates to processor cache memory and, more particularly, to a system and method for sharing L2 cache memories between processors without using snooping logic.
2. Description of the Related Art
As noted in Wikipedia, cache is a memory used by the central processing unit (CPU) or processor of a computer to reduce the average time to access memory. The cache is a smaller, faster memory that stores copies of the data from the most frequently used main memory locations. As long as most memory accesses are cached memory locations, the average latency of memory accesses is closer to the cache latency than to the latency of main memory.
When the processor needs to read from, or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from, or writes to the cache, which is much faster than reading from, or writing to main memory.
Most modern desktop and server CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation lookaside buffer (TLB) used to speed up virtual-to-physical address translation for both executable instructions and data. Data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.).
Larger caches have better hit rates but longer latency. To address this tradeoff, many computers use multiple levels of cache, with small fast caches backed up by larger slower caches. Multi-level caches generally operate by checking the smallest Level 1 (L1) cache first; if it hits, the processor proceeds at high speed. If the smaller cache misses, the next larger cache (L2) is checked, and so on, before external memory is checked.
Convention system-on-chip (SoC) devices with multiple processors have combined instruction and data caches at the L2 level. At the L1 level, it is not uncommon to have distinct L1 instruction and data caches for maximum memory access. However, this segmentation concept does not extend to the L2 level. In a multi-processor SoC, the L2 caches cannot be shared between processors. Thus, if a processor is shutdown, its associated L2 cache is shutdown, which is a waste of memory.
FIG. 9 is a schematic diagram of a multi-cache system using processor local bus to conduct snoop requests (prior art). In a conventional system, upon an L1 miss, the local L2 cache is queried. If it has the line, the L2 cache sends it to the L1 cache. If the L2 cache does not have the line, a snoop is then generated. A snoop request travels down to the PLB (processor local bus) and is propagated to the other L2 caches. The results (whether a hit or miss) then come back via the PLB. If there is no match, then the main memory is accessed to retrieve the data. The process of generating a snoop and getting the responses back via the PLB takes many clock cycles. If there is a complete L2 miss, then there is a large delay in beginning the read of the data from external memory.
It would be advantageous if the L2 caches of a multi-processor SoC could be dynamically shared based upon processor power states.
It would be advantageous if the latency in searching non-local L2 caches could be minimized.