Data transfer can be performed via a number of mechanisms, including using a network interface to transfer the data or using shared memory. Transferring data via shared memory, when an option, often provides improved performance for parallel applications of distributed computing systems, as compared to the transferring of data via network interfaces. Different types of communications can benefit from shared memory data transfer, including individual communications and collective communications.
As one example, to transfer data between tasks of a collective communication, via shared memory, a shared memory buffer is statically attached by all participating tasks of the communication and data is copied into the shared memory buffer by one or more source tasks and then copied out from the buffer by one or more destination tasks. This type of shared memory data transfer requires two copies: one copy from the one or more source tasks to the shared buffer, and then another copy from the shared buffer to the one or more destination tasks. This has negative performance implications, especially for large messages in which there may be multiple portions of the messages being copied twice.
Based on the foregoing, a need exists for an enhanced capability to transfer data for collective communications. In particular, a need exists for a capability that minimizes the copying of data during a data transfer.