Instruction pipelining involves splitting a data processor into a series of stages called a pipeline. The stages of the pipeline process multiple instructions of an instruction stream concurrently. For example, a fetch stage may fetch instructions, while an execution stage following the fetch stage simultaneously executes other previously fetched instructions. Due to the simultaneous nature of operation of the pipeline stages, processor resources are thus used more efficiently.
It is common for the instruction stream to include load instructions which, when executed, retrieve data from memory. To execute a load instruction, an instruction execution stage typically retrieves data from a data cache and an address tag from a table. The execution stage checks the retrieved tag to determine whether the retrieved data is valid. If the tag indicates that the data is valid (a cache hit), the load instruction is complete. If the tag indicates that the data is not valid (a cache miss), the execution stage retrieves the valid data from another level of memory (e.g., main memory or disk memory). A cache miss generally requires more time and processor resources than a cache hit since the execution stage must perform additional operations after the cache miss to retrieve the valid data.
The above-described load instruction may belong to a program and reside in memory with other program instructions. An instruction occurring after the load instruction may typically require the data retrieved by the load instruction. Such an instruction is called a dependent instruction because it depends on the data retrieved by a previous instruction. To guarantee correct execution of a dependent instruction, the processor must have valid data available before executing it.
Some in-order processors (processors which execute instructions in program order) delay execution of instructions that follow load instructions in order to guarantee correct execution of any dependent instructions. In particular, such processors issue a load instruction (i.e., provide the load instruction to an execution stage for execution), and then delay issuance of the next instruction in program order until the load instruction retrieves valid data. If the load instruction retrieves valid data from the data cache (a cache hit), the issue stage issues the next instruction immediately. If the load instruction retrieves invalid data from the data cache (a cache miss), the processor continues to delay issuance of the next instruction until the valid data is retrieved.
Other in-order processors issue instructions speculatively after a load instruction, before it is known whether the load instruction results in a cache hit or a cache miss. In particular, while the load instruction executes, such processors speculatively issue the next instruction in program order. If the load instruction results in a cache hit, the speculatively issued instruction executes as soon as the valid data is available. On the other hand, if the load instruction results in a cache miss, the speculatively issued instruction executes using the invalid data retrieved during the cache miss. Since this may cause the speculatively issued instruction to execute incorrectly the result of its execution is ignored. After the processor performs additional operations to eventually retrieve valid data, the processor reissues the instruction, i.e., the processor replays the instructions following the load instruction in program order. Since valid data has now been retrieved, the reissued instruction now executes correctly.
An in-order processor that delays instructions following a load instruction in program order sometimes uses processor resources rather inefficiently. In particular, the issue stage of the processor remains idle (i.e., delaying issuance of further instructions) while the execution stage determines whether data retrieved by the load instruction is valid. Furthermore, the execution stage remains idle (i.e., execution cycles go unused) while the issue stage issues instructions after valid data has been retrieved.
An in-order processor that speculatively issues instructions typically can use processor resources more efficiently than the in-order processor that delays instructions. In particular, the issue stage remains active by speculatively issuing the instructions while the execution stage determines whether data retrieved is valid. Additionally, in the case of a cache hit, the execution stage does not need to wait for instructions to be issued. Rather, the speculatively issued instructions are available for execution, and execution cycles continue to be utilized. Furthermore, the mechanism for handling a cache miss is relatively simple. In particular, when a cache miss occurs, the execution stage retrieves valid data from another level of memory, and the issue stage simply replays the instructions following the load instruction in program order. out-of-order processors attempt to use processor resources more efficiently than in-order processors by further minimizing unused processor cycles. In particular, when one or more processor cycles are about to go unused, an out-of-order processor can issue and execute one or more instructions out of program order so that processor cycles are not wasted.