As is known, many modern computing systems and other devices utilize processors having pipelined architectures to increase instruction throughput. In theory, scalar pipelined processors can execute one instruction per machine cycle (and more in super-scalar architectures) when executing a well-ordered, sequential instruction stream. This is accomplished even though an instruction itself may implicate or require a number of separate micro-instructions to be effectuated. Pipelined processors operate by breaking up the execution of an instruction into several stages that each require one machine cycle to complete. For example, in a typical system, an instruction could require many machine cycles to complete (fetch, decode, ALU operations, etc.). Latency is reduced in pipelined processors by initiating the processing of a second instruction before the actual execution of the first instruction is completed. In the above example, in fact, multiple instructions can be in various stages of processing at any given time. Thus, the overall instruction execution latency of the system (which, in general, can be thought of as the delay between the time a sequence of instructions is initiated, and the time it is finished executing) can be significantly reduced.
The above architecture works well when program execution follows a sequential flow path. In other words, this model is premised on a sequential model of program execution, where each instruction in a program is usually the one in memory immediately following the one just executed. A critical requirement and feature of programs, however, is the ability to “branch” or re-direct program execution flow to another set of instructions. Using branch instructions conditional transfer of control can be made to some other path in the executing program different from the current one. However, this path may or may not coincide with the next immediate set of instructions following the instruction that was just executed.
Stated another way, typical prior computer processors implement in-order instruction execution pipelines. An in-order processor usually fetches an instruction stream from a memory, issues and executes each instruction in the instruction stream according to a program order. Typically, such an in-order processor determines the program order as the instructions are executed. A program counter (or instruction pointer) that specifies a next instruction in the instruction stream to be executed is continuously updated with the execution of each instruction. An instruction stream typically contains certain instructions that cause discontinuities in the program order. For example, branch (or jump) instructions, call instructions, return instructions, and interrupts may cause the processor to redirect the program counter to a discontinuous location in the memory defined by a target address. Such instructions that cause discontinuities in the program order are hereinafter referred to as out-of-order instructions.
As is known, in in-order scalar processors, it is typically desired to have one instruction executed per clock cycle. In super-scalar processors, of course, it is desired to have more than one instruction executed per clock cycle, due to the parallel-pipelined configuration of the super-scalar architecture. Although any given instruction requires more than one clock cycle to fully execute (e.g., fetch, decode, execute, etc.) an effective execution of one instruction per clock cycle can be achieved by pipelining aspects of the instruction execution (e.g., fetch, decode, execute, memory access, write back, etc.) and operating on instructions (within the pipeline) in immediate succession.
There are, however, certain exceptions to the execution of instructions in immediate succession. One such exception occurs when operating on out-of-order instructions. As mentioned above, out-of-order instructions may include branch instructions, interrupts, etc. In certain processor architectures when an out-of-order instruction follows an instruction requiring a memory access, the fetch of the out-of-order instruction is delayed until the memory access of the preceding instruction is complete. One reason for this is that the memory access of the preceding instruction may result in an error condition (such as a data fault). In such situations, some processor architectures will vector to a predefined exception-handling routine in response to the data-fault condition. In addition, the data fault usually results in the processor saving its present state (e.g., saving state to a set of status registers).
Often, the ensuing out-of-order instruction alters the state of the processor. In this regard, the out-of-order instruction may change the mode of the processor (e.g., from an application mode to a system mode), or may change whether interrupts are masked or unmasked, etc. If a processor state change occurs prior to a data fault condition occurring, then when the exception-handling routine for the data fault condition executes, it may return to an improper operating mode, causing a crash or error in the intended execution of the underline code. For this reason, prior-art processors typically delay the fetch of out-of-order instructions that follow instructions that require memory accesses. This, however, results in excessive delay and overall performance degradation, since the memory access of the preceding instruction usually completes without error (making the delay, in those instances, needless).
Therefore, there is a desire to provide an in-order execution, pipelined processor that more efficiently handles the execution of out-of-order instructions.