With the increasing amount of data to be processed and transported, there is an increasing need for a processing system that can provide significantly faster performance than existing processing systems, and more particularly, a processing system that can provide significantly faster performance than existing processing systems for a given chip size and transistor/process density.
One way to improve the performance of a processing system is to use a multi-stage pipeline structure in the processing system. In a multi-stage pipeline, each stage of the pipeline may pass data to the next stage. In general, all data from a previous stage is passed on to the subsequent stage(s) even if the subsequent stage(s) may not need all the data from the previous stage. Passing the data between the stages may require a bus with a high aggregated bandwidth, which may require individual links with a higher bandwidth or more links (wires) between the stages. In addition, each stage may need to temporarily store the data before forwarding it to the next stage. Thus, a large buffer (e.g., flip-flops) may be needed at each stage. Therefore, passing data between the stages may significantly increase the overhead for each stage and increase the complexity of the circuit in each stage, and therefore may affect the speed and latency of each stage and the overall pipeline.