A dominant characteristic of a multiprocessor computer is that it is capable of concurrently executing a number of different software routines or "tasks". Typically, a program is broken down into one or more tasks, and each task is executed by a respective processing element. A multiprocessor computer has at least two processing elements. Although most common computers use a single processor, there are, in fact, many multiprocessor computers available today. The number of processing elements in each of these computers varies from two to 64,000.
As a result of executing a task, a processing element within a multiprocessor computer will generate one or more outputs. These outputs may be produced simultaneously with the outputs of other processing elements. In many circumstances, it is necessary to coordinate the generation of outputs among the processing elements. For example, in a situation where data dependency exists between two or more tasks, i.e., a successively executing task relies upon the outputs of one or more processing elements, it is necessary to synchronize the arrival of the data with the execution of the successive task.
One method currently used to synchronize the arrival of output data is polling. Using a polling method, a task waiting for data will individually poll each processing element to determine whether or not a respective output is available. Although this method allows the task to process the outputs in a coordinated fashion, it incurs a substantial amount of execution overhead. The task must repetitively execute a polling routine, which adds cost and complexity to the computer. Another method used to coordinate the arrival of outputs is the use of an interrupt processing scheme. Using this method, each processing element, upon generating an output, interrupts a successive task which is to receive the output. There are two drawbacks to using the interrupt method. First, the interrupt method adds substantial processing overhead to the computer. Second, the interrupt scheme adds considerable design complexity to the computer, which in turn increases its cost. A third technique taught by the prior art for coordinating the arrival of outputs of processing elements is to simply allow a successive task to wait until all the processing elements have generated their outputs. After the processing elements have finished generating outputs, the successive task receives the outputs and resumes execution. Such a technique is simple to implement; however, it will negatively effect the throughput of the computer because the successive task will be idle for large amounts of time waiting for data.
Thus, there exists a need for an improved method of coordinating the arrival of data outputs from processing elements in a multiprocessor system. Such a method should be simple to implement and improve the overall throughput of the multiprocessor system.