This invention relates to the pipeline processing of instructions in a processing unit, and more particularly to a method and apparatus for recovering from fault conditions and restarting the pipeline.
As microprocessing technology has advanced, methods of improving throughput have been sought. In addition to increasing the brute force of processors by increasing the clock speed, techniques to optimize the processor handling of activities have been pursued. One result has been the development of pipeline processing. Pipeline processing is one way to reduce the effective number of cycles required to execute an instruction by overlapping the execution of multiple instructions. Because the processing of a single instruction involves more than one or many actions, each instruction can be broken up into several discrete portions. Each portion then can be handled by a different stage of a processor.
A single instruction is pipelined through the stages until the processing of the instruction is complete. At any given clock cycle, one portion of the instruction is performed by a specific stage of the processor. As the other stages are not being used for the instruction during that same clock cycle, other instructions may use the other stages. Accordingly, as an instruction advances from stage to stage, additional instructions enter the pipeline and get pipelined through. Thus, multiple instructions are processed during a single clock cycle.
An instruction pipeline can potentially reduce the amount of time required per instruction by a factor equal to the depth of the pipeline. Fulfilling this potential requires that the pipeline always be filled with useful instructions and that nothing delay the advance of instructions in the pipeline. Such requirements impose certain demands on the processing architecture. For example, when serially executing an instruction stream in which each instruction may require a different number of clock cycles, there may be competition for the processor resources. Referring to FIG. 1A, a serial execution of six variable-length instructions is compared to a theoretized pipeline execution of the same instructions. The six instructions include a simple four-cycle instruction A, followed by two complex eight-cycle instructions B,C, followed by a more complex twelve-cycle instruction D, followed by a simple four-cycle instruction E, and a complex eight-cycle instruction F. As shown, 44 cycles are needed to process the six instructions serially for an average of 7.33 cycles per instruction.
Referring to FIG. 1B (Pipeline execution), the instruction portions with the letter E indicate cycles where multiple instructions require the use of the same resource. Competition for these resources blocks the progression of the instruction through the pipeline and causes delay cycles to be introduced for many of the instructions (as indicated by the blank blocks) until the resource becomes available. As depicted, 29 cycles are needed for the pipeline execution for an average of 4.83 cycles per instruction. Thus, the pipeline technique shortens the average number of cycles/instruction, although the gains are greatly reduced by the delay cycles added. In practice, moreover, the negative effects of variable execution times are much worse than shown in the example.
An objective of RISC systems has been to define an instruction set in which execution of all, or most, instructions require a uniform number of cycles. Even such RISC architectures, however, require effective management of events such as branches, exceptions and interrupts that can completely disrupt the flow of instructions.
Referring to FIG. 2, an instruction execution sequence is shown for a RISC-type R2000 processor instruction. The instruction includes five primary portions: instruction fetch (IF); read operands from registers while decoding instruction (RD); perform operation on instruction operands (ALU); access memory (MEM); and write back results to register file (WB). Referring to FIG. 3, the R2000 instruction pipeline is shown as a 5 stage pipeline, one stage per instruction portion recited above. According to the uniform instruction-length design, a competition for resources occurs only if a sign extend is needed (so that additional ALU operation is needed) or if necessary to wait for a multi-cycle co-processor operation.
Inherent in the pipeline structure are latencies for a load or branch instruction. Load instructions have a delay, or latency, of one cycle before the data is available for a subsequent instruction. Jump and branch instructions also have a delay, or latency, of one cycle while the instruction is fetched and the target address is determined. Such latencies are defined herein as processing interdependencies. One way to resolve this interdependency is to stall or delay the pipeline, as is done in conventional pipeline processors. The R2000 continues execution despite the interdependency and relies on software to avoid putting an instruction behind the instructions (i.e., load, jump or branch) which need the information before the information is ready. For example, the assembler can organize the instructions so that a useful instruction follows. If not possible to do so, a NOP instruction is inserted.
Other interferences to the smooth flow of the pipeline are exceptions, (i.e., bus error, reset, interrupt, reserved instruction, system call, overflow). When an exception is detected, the R2000 interrupts the normal execution flow, aborts the instruction causing the exception, and aborts all those instructions in the pipeline which have already started execution. A jump to the designated exception handler routine also occurs. After the exception is processed, the processor returns to the instruction causing the abort or, if that instruction also was aborted to the preceding instruction.
In summary, previous pipeline processors have introduced stall cycles into the pipeline to wait for competing resources, relied on software (e.g., assemblers) to avoid latent delays from load, jump and branch instructions and aborted the pipeline in response to exceptions.
The introduction of stall cycles into the pipeline to stall execution of all instructions in the pipeline except the instruction using the needed resource slows the pipeline more than necessary. Such stalls cause instructions that are not competing for the resource to be stalled. Accordingly, a more effective pipeline method is needed to further enhance the pipeline execution flow.
The reliance upon software to avoid latent delays adds an undesirable burden to such software. Accordingly, a more effective solution to handling latencies by the processor itself is needed.