A common method to improve performance of a central processing unit (CPU), whether the CPU is a microcontroller, a digital signal processor or a coprocessor, is by splitting the execution of combinational logic into several steps in a pipeline structure. Referring to FIG. 1, a block diagram of a conventional pipeline 10 is shown. The pipeline 20 may have multiple steps 12a-12d having combinational logic blocks 14a-14d separated by sample registers 16a-16d. Typically, each step 12a-12d is executed in a single clock cycle. The steps 12a-12d divide complex logic having a large propagation delay into small steps, each small step having a short propagation delay. By concatenating the short steps 12a-12d in the pipeline 10 and performing all of the step 12a-12ds at every clock cycle, each step working on different data, the overall performance is measured as an execution set at every clock cycle. A gain in performance is achieved by accelerating the clock from a low frequency appropriate to perform the entire complex logic to a high frequency sufficient for the slowest step 12a-12d in the pipeline 10. A side effect is that a latency of N cycles is created for a pipeline of depth N.
Extending the depth of the pipeline 10 is limited by three factors. First, deeper pipelines (more stages) increase the overall complexity that increases risk and logic area. Second, splitting an operation that could be done in a single cycle into several cycles involves some power consumption penalty. Furthermore, a latency-caused performance penalty occurs at each change-of-flow in a software program propagating through the pipeline thus reducing the performance gain achieved by the clock frequency increase.
Many conventional pipelined systems operate in several modes, some for power saving. In some cases, parts of the logic are stopped or even disconnected from power supplies to reduce power consumption. In other low power mode cases, all of the sub-blocks continue to operate but in a reduced load. In such cases, the frequency of operation is reduced so that the system consumes less power.