An instruction execution unit (e.g., a floating point unit) may employ an in-order processing pipeline for processing and completing (e.g., executing) instructions. The floating point unit may include a control/status register, which stores control bits that indicate how instructions are to be processed. The floating point unit may receive a first instruction that alters one or more of the control bits, thereby controlling execution of subsequent instructions. However, the value of the one or more altered control bits may not be known until the first instruction execution completes (e.g., when the first instruction reaches the end of the pipeline). Therefore, execution of a subsequent instruction, which requires the one or more altered control bits, may not be permitted to start until the first instruction execution completes. Consequently, the floating point unit may delay (e.g., by employing stalls or pipeline bubbles) the start of subsequent instruction execution until the first instruction execution completes. Such stalls are referred to as dependency stalls because execution of the subsequent instruction is delayed because the subsequent instruction depends on control bits to be updated (e.g., modified or altered) by a previous instruction. Dependency stalls may result in a large performance penalty for deep in-order execution processing pipelines (e.g., pipelines with a large number of stages).
To avoid delaying the start of an instruction, according to one solution, the control/status register described above is split into a control register and a status register, and register-renaming and out-of-order processing techniques are used for instruction processing. However, such a solution requires a large amount of hardware, and is therefore very complex.
Accordingly, methods and apparatus are desired for in-order execution pipelined processing.