Improvements in semiconductor processing technology have resulted in gains in computer processor performance. Not only has semiconductor feature size been reduced to allow higher component density on a die, decreases in semiconductor defects have made larger die sizes more cost effective. This has allowed integration of multiple processors and multiple levels of cache hierarchy possible in a single integrated chip.
Processor cycle time and memory access time are two important performance measures that together contribute to overall processor performance. Processor clock frequency has been improving at a rate faster than improvements in memory access time, limiting processor performance due to relatively longer memory access time. With greater interest by processor engineers in this ever-widening processor-cycle/memory-access time gap, many different cache organizations in multi-processor system have been proposed. Typically today, each processor core on a multiprocessor chip has its own first level cache. First level cache is a level of cache in a cache hierarchy most closely coupled to a processing unit of the processor. Typically the level-one cache is the fastest and/or smallest cache level coupled to the processor. Depending on the size of the individual processor and the amount of cache required, a level-two of cache may be integrated on-chip or located off-chip. The level-two cache is coupled to the level-one cache and is often shared by more than one processor in multiprocessor systems.
FIG. 1A and FIG. 1B illustrate prior multiprocessor systems. In FIG. 1A, processors 102 and 104, each with its own level-one cache, are connected to shared level-two cache 108 through shared bus 106. Processors can effectively use a common level-two cache only up to a limited number of processors. As large numbers of processors use a single level-two cache, cache design becomes difficult. Since cache line conflicts increase with more processors, a higher degree of cache set associativity is required to maintain the performance of the cache. The bandwidth needed to supply data to and from the level-two cache increases as well. High bandwidth connections are especially difficult to build if long global wires are required to connect multiple processors to the level-two cache. These long global wires can limit the maximum frequency of the entire chip. In FIG. 1B, each processor (110 and 112) is connected to a corresponding level-two cache (114 and 116 respectively). The level-two caches are connected to each other through shared bus 118. Having only one processor connected to a level-two cache simplifies the level-two cache design requirements, but the performance benefits of a single shared level-two cache associated with multiple processors are lost. In the systems of FIG. 1A and FIG. 1B, the shared bus can saturate as more processors transfer data through the shared bus. Therefore, there exists a need to utilize better cache organizations for performance and scalability in multiprocessor chips.