FIG. 1 shows the architecture of an exemplary multi-core processor 100. As observed in FIG. 1, the processor includes: 1) multiple processing cores 101_1 to 101_N; 2) an interconnection network 102; 3) a last level caching system 103; 4) a memory controller 104 and an I/O hub 105. Each of the processing cores contain one or more instruction execution pipelines for executing program code instructions. The interconnect network 102 serves to interconnect each of the cores 101_1 to 101_N to each other as well as the other components 103, 104, 105. The last level caching system 103 serves as a last layer of cache in the processor before instructions and/or data are evicted to system memory 108. Each core typically has one or more of its own internal caching levels.
The memory controller 104 reads/writes data and instructions from/to system memory 108. The I/O hub 105 manages communication between the processor and “I/O” devices (e.g., non volatile storage devices and/or network interfaces). Port 106 stems from the interconnection network 102 to link multiple processors so that systems having more than N cores can be realized. Graphics processor 107 performs graphics computations. Power management circuitry (not shown) manages the performance and power states of the processor as a whole (“package level”) as well as aspects of the performance and power states of the individual units within the processor such as the individual cores 101_1 to 101_N, graphics processor 107, etc. Other functional blocks of significance (e.g., phase locked loop (PLL) circuitry) are not depicted in FIG. 1 for convenience.
As is understood in the art, each core typically includes at least one instruction execution pipeline. An instruction execution pipeline is a special type of circuit designed to handle the processing of program code in stages. According to a typical instruction execution pipeline design, an instruction fetch stage fetches instructions, an instruction decode stage decodes the instruction, a data fetch stage fetches data called out by the instruction, an execution stage containing different types of functional units actually performs the operation called out by the instruction on the data fetched by the data fetch stage (typically one functional unit will execute an instruction but a single functional unit can be designed to execute different types of instructions). A write back stage commits an instruction's results to register space coupled to the pipeline. This same register space is frequently accessed by the data fetch stage to fetch instructions as well.