1. Field of the Invention
The present invention relates to processors and computers. More specifically, the present invention relates to stall detection and control systems in a pipelined processor.
2. Description of the Related Art
Processors have long used pipelining to improve operating speed. A pipelined processor executes multiple instructions in parallel in an overlapped manner so that each stage of multiple stages in the pipeline completes a part of an instruction. Throughput of the pipeline relates to the number of instructions that can be processed in a given time interval.
Various conditions reduce the throughput of the pipeline by preventing a next instruction in an instruction stream from executing during a clock cycle. These conditions reduce the performance of the processor from the potential gains of pipelining. The conditions include structural conditions arising from resource conflicts in which functional units of the processor cannot support a combination of instructions in simultaneous overlapped execution. Data dependency conditions arise when an instruction depends on the results of an instruction which has not made the results available in the overlapping of instructions in the pipeline. Control conditions arise from pipelining of branches and other instructions that change the program counter of the processor.
The conditions occasionally necessitate stalling of the pipeline. A stall in a pipelined processor typically is handled by allowing some instructions to proceed while other instructions are delayed. The typical response when an instruction is stalled is to also stall all instructions that are subsequent to the stalled instruction in the pipeline. Instructions that are earlier in the pipeline are not stalled but no new instructions are fetched during the stall.
The handling of stalls includes two events, (2) the detection of the condition causing the stall, and (2) propagation of the stall signal throughout the processor. Both events create a critical timing path in modem fast processors. Stall logic circuits that detect stall conditions generally involves much circuitry, consuming a large amount of area on an integrated circuit die. The stall logic circuits receive signals from functional units located at several locations, often including distant locations, on an integrated circuit chip. The stall interconnections and circuitry therefore include long interconnection wires and extensive control logic, resulting in slow execution times of the processor.
The control logic typically attempts to stall the front end of the pipeline. In some processors, the control logic attempts to stall the entire pipeline. Since the various stages of the pipeline are associated with all aspects of instruction functionality, including instruction fetching, decoding, execution of an entire range of instructions, trapping, writeback, and the like, stall signals are routed across essentially the entire functional layout of the integrated circuit. Therefore, the stall signals are propagated along long wires having lengths over many millimeters. The stall signals are typically propagated the functionally long distances within a single clock cycle. Accordingly, the stall logic is often a critical timing path and cycle time limitation in high-performance processors.
A processor and operating technique are needed that improve execution performance of stall operations.