FIG. 1 shows the architecture of a standard multi-core processor design 100. As observed in FIG. 1, the processor includes: 1) multiple processing cores 101_1 to 101_N; 2) an interconnection network 102; 3) a last level caching system 103; 4) a memory controller 104 and an I/O hub 105. Each of the processing cores 101_1 to 101_N contain one or more instruction execution pipelines for executing program code instructions. The interconnect network 102 serves to interconnect each of the cores 101_1 to 101_N to each other as well as the other components 103, 104, 105, 106. The last level caching system 103 serves as a last layer of cache in the processor before instructions and/or data are sent to or requested from system memory 108. The memory controller 104 reads/writes data and instructions from/to system memory 108. The I/O hub 105 manages communication between the processor and “I/O” devices (e.g., non volatile storage devices and/or network interfaces). Port 106 stems from the interconnection network 102 to link multiple processors so that systems having more than N cores can be realized. Graphics processor 107 performs graphics computations. Other functional blocks of significance (phase locked loop (PLL) circuitry, power management circuitry, etc.) are not depicted in FIG. 1 for convenience.
A common processor caching hierarchy includes both L1 and L2 caches 109, 110 within each core, and, the last level cache 103 acting as an L3 or higher level cache. Here, the L1 caching level 109 is tightly coupled to the core's instruction execution pipeline(s). For example, for each instruction execution pipeline with a core, an L1 data cache is tightly coupled to a memory access functional unit within the pipeline's execution stage and an L1 instruction cache is tightly coupled to the pipeline's instruction fetch stage. An L2 cache 109 is resident within the core as the next lower caching level beneath the core's L1 caching level.