FIG. 1 shows the architecture of an exemplary multi-core processor 100. As observed in FIG. 1, the processor includes: 1) multiple processing cores 101_1 to 101_N; 2) an interconnection network 102; 3) a last level caching system 103; 4) a memory controller 104 and an I/O hub 105. Each of the processing cores contain one or more instruction execution pipelines for executing program code instructions. The interconnect network 102 serves to interconnect each of the cores 101_1 to 101_N to each other as well as the other components 103, 104, 105. The last level caching system 103 serves as a last layer of cache in the processor before instructions and/or data are evicted to system memory 108. Each core typically has one or more of its own internal caching levels.
The memory controller 104 reads/writes data and instructions from/to system memory 108. The I/O hub 105 manages communication between the processor and “I/O” devices (e.g., non volatile storage devices and/or network interfaces). Port 106 stems from the interconnection network 102 to link multiple processors so that systems having more than N cores can be realized. Graphics processor 107 performs graphics computations. Power management circuitry (not shown) manages the performance and power states of the processor as a whole (“package level”) as well as aspects of the performance and power states of the individual units within the processor such as the individual cores 101_1 to 101_N, graphics processor 107, etc. Other functional blocks of significance (e.g., phase locked loop (PLL) circuitry) are not depicted in FIG. 1 for convenience.
FIG. 2 shows an exemplary embodiment 200 of one of the processing cores of FIG. 1. As observed in FIG. 2, each core includes two instruction execution pipelines 250, 260. Each instruction execution pipeline 250, 260 includes its own respective: i) instruction fetch stage 201; ii) data fetch stage 202; iii) instruction execution stage 203; and, iv) write back stage 204.
The instruction fetch stage 201 fetches “next” instructions in an instruction sequence from a cache, or, system memory (if the desired instructions are not within the cache). Instructions typically specify operand data and an operation to be performed on the operand data. The data fetch stage 202 fetches the operand data from local operand register space, a data cache or system memory. The instruction execution stage 203 contains a set of functional units, any one of which is called upon to perform the particular operation called out by any one instruction on the operand data that is specified by the instruction and fetched by the data fetch stage 202. The write back stage 204 “commits” the result of the execution, typically by writing the result into local register space coupled to the respective pipeline.
In order to avoid the unnecessary delay of an instruction that does not have any dependencies on earlier “in flight” instructions, many modern instruction execution pipelines have enhanced data fetch and write back stages to effect “out-of-order” execution. Here, the respective data fetch stage 202 of pipelines 250, 260 is enhanced to include data dependency logic 205 to recognize when an instruction does not have a dependency on an earlier in flight instruction, and, permit its issuance to the instruction execution stage 203 “ahead of”, e.g., an earlier instruction whose data has not yet been fetched.
Moreover, the write-back stage 204 is enhanced to include a re-order buffer 206 that re-orders the results of out-of-order executed instructions into their correct order, and, delays their commitment to the physical register file at least until a correctly ordered consecutive sequence of instruction execution results have retired. In order to further support out-of-order execution, results held in the re-order buffer 206 can be fed back to the data fetch stage 202 so that later instructions that depend on the results can also issue to the instruction execution stage 203.
The enhanced instruction execution pipeline is also observed to include instruction speculation logic 207 within the instruction fetch stage 201. Instruction sequences branch out into different paths depending on a condition such as the value of a variable. The speculation logic 207 studies the upcoming instruction sequence, guesses at what conditional branch direction or jump the instruction sequence will take (it guesses because the condition that determines the branch direction or jump may not have been executed or committed yet) and begins to fetch the instruction sequence that flows from that direction or jump. The speculative instructions are then processed by the remaining stages of the execution pipeline.
Here, the re-order buffer 206 of the write back stage 204 will delay the commitment of the results of the speculatively executed instructions until there is confirmation that the original guess made by the speculation logic 207 was correct. Once confirmation is made that the guess was correct, the results are committed to the architectural register file. If it turns out the guess was wrong, results in the re-order buffer 206 for the speculative instructions are discarded (“flushed”) as is as the state of any in flight speculative instructions within the pipeline 200. The pipeline 200 then re-executes from the branch/jump with the correct sequence of instructions.