High performance computing systems generally include multiple processor nodes or processing elements that work in parallel, on data stored in a shared global memory, to increase processing performance. The global memory may be a distributed memory that is configured as a partitioned global address space (PGAS) with each partition residing in a local memory of one of the processing elements. Communications between processing elements and the PGAS, for example a read or write of a buffer segment, although appearing to be one-sided at the user/application level, typically involve one or more bi-directional exchanges between a sender node and a receiver node at the network level to maintain correct buffer address offsets. These exchanges and the associated synchronization delays may adversely affect performance.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art.