1. Field
The disclosure relates generally to an improved data processing system, and more specifically, to a computer implemented method, system, and computer usable program code for increasing performance of a shared-memory parallel program on a distributed memory machine by increasing network communication performance and cache performance.
2. Description of the Related Art
Shared memory parallel programs of fine-grain parallelism with irregular memory access inputs remain challenging on current architectures. Recent studies have proposed techniques to reduce the gap between computer program and computer architecture for shared-memory platforms. Implementing irregular shared memory parallel programs with high performance is even harder on distributed memory machines where the adverse impact of irregular memory accesses is magnified when memory access requests are served by remote nodes on a distributed memory systems.
As a result, although many fast theoretic programs may exist in the literature, few experimental results are known. The partitioned global address space (PGAS) programming paradigm appears to improve ease of programming for irregular programs. Yet, when the workload is memory intensive and the memory access pattern is irregular, no convincing evidence exists of high performance PGAS implementations.