FIG. 1 shows the architecture of an exemplary multi-core processor 100. As observed in FIG. 1, the processor includes: 1) multiple processing cores 101_1 to 101_N; 2) an interconnection network 102; 3) a last level caching system 103;4) a memory controller 104 and an I/O hub 105. Each of the processing cores contain one or more instruction execution pipelines for executing program code instructions. The interconnect network 102 serves to interconnect each of the cores 101_1 to 101_N to each other as well as the other components 103, 104, 105. The last level caching system 103 serves as a last layer of cache in the processor before instructions and/or data are evicted to system memory 108.
The memory controller 104 reads/writes data and instructions from/to system memory 108. The I/O hub 105 manages communication between the processor and “I/O” devices (e.g., non volatile storage devices and/or network interfaces). Port 106 stems from the interconnection network 102 to link multiple processors so that systems having more than N cores can be realized. Graphics processor 107 performs graphics computations. Power management circuitry (not shown) manages the performance and power states of the processor as a whole (“package level”) as well as aspects of the performance and power states of the individual units within the processor such as the individual cores 101_1 to 101_N, graphics processor 107, etc. Other functional blocks of significance (e.g., phase locked loop (PLL) circuitry) are not depicted in FIG. 1 for convenience.
FIG. 2 shows an exemplary embodiment 200 of one of the processing cores of FIG. 1. As observed in FIG. 2, each core includes two instruction execution pipelines 250, 260. Each instruction execution pipeline 250, 260 includes its own respective: i) instruction fetch stage 201; ii) data fetch stage 202; iii) instruction execution stage 203; and, iv) write back stage 204. The instruction fetch stage 201 fetches “next” instructions in an instruction sequence from a cache, or, system memory (if the desired instructions are not within the cache). Instructions typically specify operand data and an operation to be performed on the operand data. The data fetch stage 202 fetches the operand data from local operand register space, a data cache or system memory. The instruction execution stage 203 contains a set of functional units, any one of which is called upon to perform the particular operation called out by any one instruction on the operand data that is specified by the instruction and fetched by the data fetch stage 202. The write back stage 204 “commits” the result of the execution, typically by writing the result into local register space coupled to the respective pipeline.
In order to avoid the unnecessary delay of an instruction that does not have any dependencies on earlier “in flight” instructions, many modern instruction execution pipelines have enhanced data fetch and write back stages to effect “out-of-order” execution. Here, the respective data fetch stage 202 of pipelines 250, 260 is enhanced to include data dependency logic 205 to recognize when an instruction does not have a dependency on an earlier in flight instruction, and, permit its issuance to the instruction execution stage 203 “ahead of”, e.g., an earlier instruction whose data has not yet been fetched.
Moreover, the write-back stage 204 is enhanced to include a re-order buffer 206 that re-orders the results of out-of-order executed instructions into their correct order, and, delays their retirement to the physical register file until a correctly ordered consecutive sequence of instruction execution results have retired.
The enhanced instruction execution pipeline is also observed to include instruction speculation logic 207 within the instruction fetch stage 201. The speculation logic 207 guesses at what conditional branch direction or jump the instruction sequence will take and begins to fetch the instruction sequence that flows from that direction or jump. The speculative instructions are then processed by the remaining stages of the execution pipeline.