1. Field of the Invention
The invention pertains generally to computers. In particular, the invention pertains to parallel processing.
2. Background of the Invention
Some conventional processors include multiple parallel processing units to increase overall processing throughput. A stream of sequential instructions to be executed is segmented into groups, with each group going to one of the multiple parallel processing units for further processing. The outputs of the multiple processing units are then combined into a single sequence in the same order that would have been achieved if a single processing unit had executed all instructions in their original order. Since this technique allows multiple instructions to be executed simultaneously, the overall instruction throughput may be several times higher than it would be if all instructions were being executed sequentially in a single processing unit.
Because the instructions in the different processing units may execute at different rates, the outputs of the different processing units may be available in a different order than the original instruction sequence. A mechanism merges the outputs from the various processing units into a single stream that reflects the original instruction sequence. This merging of multiple out-of-order results into the original order is frequently referred to as instruction reordering or as preserving program-order semantics. Conventional methods for merging the multiple parallel instruction streams involve the use of a central component to keep track of the instructions as they flow through the parallel processing units. This approach typically requires design complexity that substantially increases with increased parallelism. Increased design complexity translates into other problematic factors that may include larger circuit implementation area, increased power consumption, longer development time, and more difficult validation of the functionality.