Computing environments are often configured into computer systems with a relatively large, relatively slow main memory. Typically, multiple dynamic random access memory (DRAM) modules may comprise the memory system. This large memory system provides storage for a large number of instructions and/or a large amount of data for use by the processing units of the computing environment, and provides faster access to the instructions and/or data than may be achieved from disk storage, for example. However, the access times of modern DRAMs are significantly longer than the clock cycle length of modern processing units. The memory access time for a set of bytes being transferred to a processing unit may therefore be relatively long. Accordingly, the memory system is not a high bandwidth system, and the processing units may suffer performance due to a lack of available memory bandwidth. In order to allow high bandwidth memory access, and thereby increase the instruction execution efficiency (and ultimately processing unit performance), computer systems typically employ multiple caches, both processing unit external and internal, to store the most recently accessed data and instructions. A relatively small number of clock cycles is typically required to access data stored in a cache, as opposed to a relatively large number of clock cycles to access data in main memory.
Because the relative speed of memory is growing at a slower rate than processor speed, each successive computer system generation has a higher and higher dependency on the cache subsystem. Further, for certain workloads, large shared caches deliver better results than large private caches. For other workloads, private caches are preferred.
Previously, computer system designs have been optimized for either sharing or replication. Today's mainframe computing environments typically implement a shared L2 cache and arrange all the processors sharing it in a single multichip module package. This approach is expensive, but optimizes caching for multicontext workloads. As the mainframe computing environment attempts to move into different workloads, the shared cache provides less aggregate cache for very parallel workloads, notably industry standard benchmarks. Conversely, UNIX-based computer systems have typically implemented private caching. With a move into virtualization and workload management of multicontext workloads, the replication required of private caching disadvantageously reduces relative capacity and total cache capacity.