Many of today's microprocessors incorporate structures known as instruction pipelines. Instruction pipelines increase the efficiency of a processor by enabling a processor to simultaneously process a plurality of instructions. Instruction pipelines can be thought of as instruction assembly lines. As Instruction_0 enters the first stage of the pipeline, Instruction_1 is simultaneously processed in the second stage of the pipeline, Instruction_2 is simultaneously processed in the third stage of the pipeline, and so on. Periodically, a new instruction is clocked into an instruction pipeline, and each instruction being processed in the pipeline is passed to the next stage of the pipeline, or is output from the pipeline.
To maximize instruction execution efficiency, it is desirable to keep instruction pipelines full as often as possible (with an instruction being processed in each stage of the pipeline) such that each periodic clocking of an instruction pipeline produces a useful output. However, a pipeline will sometimes generate an exception, or will need more time to determine whether an exception might be about to occur. In either case, the pipeline needs to stall the progression of data through its stages until the exception can be resolved. Since many of today's microprocessors not only incorporate instruction pipelines, but incorporate multiple, parallel instruction pipelines, a stall of one of the parallel pipelines will often necessitate a stall of some or all of the other pipelines. For example, when a microprocessor executes instructions in program order, or executes groups of instructions between predetermined program stops, which groups of instructions must be executed in order, a stall which is initiated by a stage Y of a first pipeline often dictates the stall of any pipeline stage which is orthogonal to or upstream from stage Y.
Unfortunately, existing means for stalling pipeline data often have a negative impact on a pipeline's performance. For example, most stall means utilize a number of latches to store stalled data. However, in a speed critical pipeline stage, the need to latch data as it propagates through the stage results in costly and undesirable delay.
Furthermore, if a stall is generated late in a stage, data must often be stalled in the stage using recirculating latches rather than clocked latches. Recirculating latches cause a stage to not only incur a latch propagation delay, but can also cause a stage to incur wire delay, capacitive delay, etc. This is especially so when a stage which requires the use of recirculating latches is a data heavy stage.
For example, the multiply array of a floating-point multiply accumulate unit (FMAC) often spans two stages of a pipeline. As a result, the stall of data in the first stage of the multiply array requires the storage of numerous partial products. In addition, the route of a stall enable line over such a multiply array leads to an even greater density of wiring in the multiply array, and results in increased capacitance, etc.
What is needed are new methods and apparatus for stalling the data of speed critical pipeline stages.