This invention relates to computing machines.
A Von Neumann sequential architecture computer operates on only one token of information at a time, and that token must be no greater than the word size of the machine. Operations are performed by gathering contributing pieces of information from memory into the processor, combining them according to instructions, and placing the result back into memory. Executing several instructions may be required to do the desired operation. Obviously, being able to operate on multiple tokens at the same time should allow the set of operations in a program to be executed much faster.
To execute these multiple operations, concurrent processing systems use multiple processors and often use multiple memories. There are many multi-processor configurations used, including parallel, pipe-lined, and adaptive structures. Except for special cases such as array processing, it has proven extremely difficult to assign operations on tokens to processors and memories in such a way that needed information can be obtained efficiently within the constraints of the system's architecture. Further, it is difficult to assign balanced workloads to the processors to maintain a high level of concurrency.
A good assignment of operations and tokens to processors and memory is one where the dependencies of the tokens match the communications channels between the processors and memories. On multi-processor systems, this means that clusters of interdependent pieces of information will be assigned to each processor/memory unit. Dependencies across cluster boundaries require communication between the high-level modules of the multi-processor. The efficiency of such inter-module communication is much less than the efficiency when implementing dependencies within one unit, so it is important that the boundaries of the processor assignments fall on cluster boundaries, and that the clusters themselves be created for minimum interdependence. Additionally, to use all the processors efficiently, an equal amount of work should be assigned to each processor. It is these constraints that are so difficult to satisfy in the programming of concurrent systems based on multiple processors.