Parallel computer applications often use message passing to communicate between processors. Message passing utilities such as the Message Passing Interface (MPI) support two types of communication: point-to-point and collective. In point-to-point messaging a processor sends a message to another processor that is ready to receive it. In a collective communication operation, however, many processors participate together in the communication operation. Examples of collective operations are broadcast, barrier, all-to-all, etc.
Each collective communication operation needs to be optimized to maximize performance. The known methodologies implement the collective communication operations through separate calls and in a separate software stack. Most typical implementations are specific to hardware or part of specific languages or runtimes. Such implementation methods result in high development and maintenance overheads. In addition, in known methodologies, this type of implementations is repeated for every new version of a parallel computer. For different parallel computers or different versions of a parallel computer, several different parallel programming paradigms need to be supported and each of them may define its own collective primitives. Each of these requires separate implementations and optimized runtimes.
Thus, a framework that isolates the fundamental components of collective communication to minimize the development effort across different parallel programming languages and supercomputer architectures is desirable.