With increasing demand in computing powers to meet rapidly growing electronic industries such as communications networks, various parallel processing systems have been constructed to meeting such demand. A conventional parallel processing system often employs multiple processing elements or engines in order to obtain results more quickly. A typical configuration of a conventional parallel processing system arranges its processors in a tandem layout, which creates a scenario in which all engines or processor compete for the same shared resources such as memory access. As such, a problem associated with a typical parallel processing system is routing congestion since all engines and shared resources are interconnected with each other.
A conventional approach to resolve routing congestion in a parallel processing system is to add routing channels which results in increased die size. In addition, due to the heavy loading experienced by the signals, the operating speed of the design will need to be reduced.
The conventional approach to address the increased signal loading is to add additional pipeline stages. This would shorten the signal traces between flip-flops, thereby reducing the signal load and allowing the operating speed of the design to increase. Pipelining, however requires the addition of logic gates and flip-flops to the design which could result in further increase in die size as well as greater power consumption.