Computer processors contain arithmetic, logic, and control circuitry that interpret and execute instructions from a computer program. In the pursuit of improving processor performance, designers have sought two main goals: making operations faster and executing more operations in parallel. Making operations faster can be approached in several ways. For example, transistors can be made to switch faster and thus propagate signals faster by improving semiconductor processes; execution-unit latency can be reduced by increasing the number of transistors in the design; and the levels of logic required by the design to implement a given function can be minimized to increase speed. To execute more operations in parallel, designers mainly rely on one, or a combination of pipelining and superscalar techniques. Pipelined processors overlap instructions in time on common execution resources. Superscalar processors overlap instructions in space on separate resources.
Pipeline stalls are a main performance inhibitor with regard to parallel processing. Stalls arise from data dependencies, changes in program flow, and hardware resource conflicts. At times, pipeline stalls can be avoided by rearranging the order of execution for a set of instructions. Compilers can be used to statically reschedule instructions, however, incomplete knowledge of run-time information reduces the effectiveness of static rescheduling. In-order processors, i.e., processors that issue, execute, complete, and retire instructions in strict program order, have to rely entirely on static rescheduling and thus are prone to pipeline stalls.
As a result, designers use out-of-order processors and seek to implement dynamic instruction rescheduling. The simplest out-of-order processors issue instructions in order but allow them to execute and complete out of order. Even these simple out-of-order processors require complex hardware to reorder results before the corresponding instructions are retired. A strict result order is not required from a data-flow perspective, however, such ordering is necessary to maintain precise exceptions and to recover from mispredicted speculative execution.
A well-known method of reordering is through the use of a reorder buffer, i.e., a buffer that maintains results until written to the register file in program order. Designers also use other types of reordering hardware, such as history buffers and future files. History buffers record source-operand history so the processor can backtrack to a precise architectural state and future files store the current state and the architectural state in separate register files allowing the processor to be restored to a precise check-point state.
Branch prediction and speculative execution are additional techniques used to reduce pipeline stalls. In a pipelined processor, the outcomes of conditional branches are often determined after fetching subsequent instructions. Thus, if the correct direction of the unresolved branch can be predicted, the instruction queue can be kept full of instructions that have a high probability of being used. In some processors, instructions are actually executed speculatively beyond unresolved conditional branches. This technique completely avoids pipeline stalls when the branch proceeds in the predicted direction. On the other hand, if the branch direction is mispredicted, the pipeline must be flushed, instruction fetch redirected, and the pipeline refilled.
Referring to FIG. 1, a typical computer system 10 includes a Prefetch, branch prediction, and dispatch unit (PDU) 12, Integer execution unit (IEU) 14, Floating-point unit (FPU) 16, Memory interface unit (MIU) 18, External cache (E-Cache) unit (ECU) 20, load store unit (LSU) 22, and Memory management unit (MMU) 24.
PDU 12 fetches instructions before they are actually needed in the pipeline, so the execution units constantly have instructions to execute. Instructions can be prefetched from all levels of the memory hierarchy, including the instruction cache, the external cache and the main memory. In order to prefetch across conditional branches, a dynamic branch prediction scheme is implemented in hardware. The outcome of a branch is based on a two-bit history of the branch. A xe2x80x9cnext fieldxe2x80x9d associated with every four instructions in the instruction cache (I-Cache) points to the next I-Cache line to be fetched. The use of the xe2x80x9cnext fieldxe2x80x9d makes it possible to follow taken branches and basically provides the same instruction bandwidth achieved while running sequential code. Prefetched instructions are stored in the instruction buffer until they are sent to the rest of the pipeline.
When prefetching instructions, the results of conditional trap instructions are difficult to predict. Therefore, in prior art systems, PDU 12 stops prefetching instructions until the trap instruction is actually executed. When a trap instruction is encountered by PDU 12, prefetching is halted, since it is very difficult to predict whether a trap instruction will result in a transition into program code containing trap handling instructions, or will instead proceed with normal program flow. As a result, the instruction pipelines will eventually deplete, and no instructions will be executed, until the trap instruction is resolved. Thus, overall instruction execution efficiency is lower than otherwise would be obtained if an error mechanism were able to be predicted. It would therefore be beneficial to provide a method for executing an error handling mechanism in such a way as to enable branch prediction and therefore increase instruction execution efficiency.
In one aspect, a method for managing program flow in a computer system having a processor having a prefetch mechanism and an instruction pipeline includes providing a set of program instructions having a conditional branch instruction and an system fault-causing instruction, prefetching at least one instruction into the instruction pipeline, the instruction including at least a conditional branch instruction, predicting the outcome of the conditional branch instruction; and prefetching instructions into the instruction queue based upon the result of the predicting step. The branch instruction is configured to direct program flow into or beyond the system fault instruction depending on the result of a predetermined condition.