Computer systems often have more than one processor, in order to increase performance. In fact, massively parallel computing structures (also referred to as “ultra-scale computers” or “supercomputers”) interconnect large numbers (tens of thousands) of nodes, each of which includes a processor or processors. Often, the nodes are connected by a network topology in a tree, torus, or mesh form that supports message passing. One example of a supercomputer is the IBM System Blue Gene® Solution available from International Business Machines Corporation of Armonk, N.Y.
On supercomputers, a parallel program is typically divided into processes, and the processes execute on various nodes and communicate to each other via message passing. The cost of communications between the nodes varies depending on the distance between the nodes involved and other factors such as the availability of buffers, the number of available paths through the network, and network contention. An important challenge in supercomputer design is to optimally map the parallel program to the nodes, in order to minimize the total execution time of the parallel program, which is a function of both the time for communication and the time for computation. Since the communication of the parallel program changes over time as the processes execute, the optimal mapping also changes. Thus, in an attempt to keep the execution time of the parallel program low, supercomputers use process migration algorithms to detect non-optimal communications between the nodes and respond by moving the processes between the nodes. Unfortunately, current process migration algorithms use significant amounts of temporary storage, which is expensive, and are difficult to scale to the large number of nodes used on new supercomputers.
Thus, what is needed is a process migration algorithm that performs well, scales to large number of processes, and does not require temporary storage.