Instruction pipelining is a technique used in processors (e.g., microprocessors, microcontrollers) to allow for parallel processing of instructions. For example, one instruction is associated with a first stage of an instruction pipeline and another instruction is associated with a second stage of the instruction pipeline. The instruction pipeline allows for “breaking” of the timing associated with a large data path, and provides parallelism in executing the instructions at an increased clock frequency.
The instruction pipeline offers optimum performance only when the constituent stages are perfectly balanced. A balanced pipeline implies that processing associated with a constituent stage of the pipeline takes a completion time equal to the completion time associated with all other constituent stage(s) of the instruction pipeline. However, there are scenarios (e.g., hard macro(s) such as memory/memories being in the data path of the pipeline, Arithmetic Logic Units (ALU units) such as multipliers, adders, bit shifters and dividers being in a same constituent stage of the pipeline) where a programmer/user is not able to perfectly balance the instruction pipeline. Here, the maximum frequency at which the unbalanced pipeline is clocked is determined through the constituent stage therein offering the maximum delay.
Assuming no stalls in an unbalanced instruction pipeline, the maximum frequency, fmax, at which the unbalanced instruction pipeline is clocked is expressed in example Equation (1) as:
                                          f            max                    =                      1            d                          ,                            (        1        )            where d is the maximum delay offered by a constituent stage.
Assuming the time taken for executing N instructions to be (N+ns) cycles (ns being the number of constituent stages of the unbalanced instruction pipeline), the effective throughput, E, is be expressed in example Equation (2) as:
                    E        =                                            f              max                        ·                          N                              (                                  N                  +                                      n                    s                                                  )                                              ∼                      f            max                                              (        2        )            The throughput, E as seen in Equation (2), is the number of instructions per second. Increased throughput is associated with a higher fmax, which implies a lower maximum delay offered by the constituent stage of the unbalanced instruction pipeline.
The pipeline can be clocked at a frequency higher than that computed based on the max-delay, and when the usage of timing-path involving the max-delay is detected, then the pipeline can be stalled for a number of cycles equivalent to the delay offered by the timing-path. This is known as pipeline stalling.
With the above approach, the frequency might not be optimal, if the usage of the timing-path involving max-delay is not frequent. It would lead to unnecessary dynamic power dissipation. Hence, there is a need to arrive at an optimum frequency for a given rate of usage of the timing-path involving the maximum delay.