By dividing a process into steps, and performing the steps simultaneously on different data, the frequency at which the processing system can operate depends on the length of time to complete its slowest step. The same process performed with a shorter “slowest step” can be performed faster. Designing shorter steps in order to achieve higher speed generally requires creating more steps. More steps, given imperfect balancing of the length of steps, require more time to generate the resulting output from any particular input.
Processes are often referred to as pipelines. Storage devices between steps are referred to as pipeline stages. The number of steps in a processing system is referred to as the pipeline depth. Within a system-on-chip (SoC) data moves through a pipeline no faster than one stage per clock cycle. Therefore, the number of stages determines the number of clock cycles for each input datum to be fully processed. Such time is often referred to as latency. Longer latency due to pipeline stages is undesirable, but the faster clock frequency due to pipeline stages is desirable. Designing an optimal number of pipeline stages requires a trade-off between clock frequency and cycles of latency.
Within the design of a data processing chip it is not immediately clear how to optimally apply pipelining. The conventional method of designing a pipeline that works is an iterative process of experimentation. It is time consuming. Furthermore, it rarely results in an optimal design.
The problem is further complicated when the physical layout of the chip is considered. The length of time for data to propagate from a point of production to a point of consumption depends on the distance and the average propagation rate through the wires between those points. Conventionally, the physical layout is considered at a later stage in the chip design process than the decisions about pipelining. To avoid unexpected problems achieving the desired clock frequency during physical design, pipeline stages are placed at smaller increments within the processing logic, thereby leaving extra time for data to propagate between points if they happen to be distant in the physical design. This over-design of pipeline stages costs area, power consumption, and especially latency. Therefore, what is needed is a system and method to automatically determine optional pipeline stages in order to meet clock frequency constraints.