As technological improvements allow increasing numbers of processing elements to be placed on a single die, the interconnection requirements for the processing elements continues to increase along with clock rate used by these processing elements. With these higher clock rates and increased loading delays from higher circuit densities, multiple stages may be required to transfer data along the communications bus. Placing stages in a ring topology allows for a very large total bandwidth in a multiprocessing environment by maintaining a small distance between processors.