Enhanced computer system performance may typically be achieved by harnessing the processing power of multiple individual processing units. One common multi-processor (MP) architecture is the symmetric multi-processor (SMP) architecture in which multiple processing units (or elements) are supported by a multi-level cache hierarchy. In the SMP architecture processing elements share a common pool of resources (e.g., a system memory and input/output (I/O) subsystem) that are often coupled to a shared system interconnect.
Partitioned Global Address Space (PGAS) is a parallel programming model that assumes a global memory address space that is logically partitioned and a portion of it is local to each process or thread. In global address space programming models (e.g., the SHMEM library), synchronization between processing elements is accomplished through synchronization variables located within the global address space. However, this synchronization mechanism suffers from inefficient initiator-managed protocols in which an initiating processing element remotely manages the steps in synchronizing data accesses with a receiver processing element through structures located at the receiver, thus resulting in high latencies and low throughput.