1. Technical Field
The present invention generally relates to data processing systems and in particular to distributed data processing systems. Still more particularly, the present invention relates to communication among tasks executing in distributed data processing systems.
2. Description of the Related Art
It is well-known in the computer arts that greater computer system performance can be achieved by harnessing the processing power of multiple individual processing units. Multi-processor (MP) computer systems can be designed with a number of different topologies, of which various ones may be better suited for particular applications depending upon the performance requirements and software environment of each application. As the size of the processing systems scale upwards with demands for more processing power and less localized clustering of hardware, processing architecture has advanced from: (a) symmetric multi-processor (SMP) architecture in which multiple processing units, each supported by a multi-level cache hierarchy, share a common pool of resources, such as a system memory and input/output (I/O) subsystem, which are often coupled to a shared system interconnect; followed by (b) non-uniform memory access (NUMA) architecture, which includes a switch or other global interconnect to which multiple nodes, which can each be implemented as a small-scale SMP system, are connected; parallel computing architecture, in which multiple processor nodes are interconnected to each other via a system interconnect or fabric, and the multiple processor nodes are then utilized to execute specific tasks, which may be individual/independent tasks or parts of a large job that is made up of multiple tasks. Even more recently, the parallel computing architecture has been further enhanced to enable support for tasks associated with a single job to share parts of their effective address space (within a global address space (GAS) paradigm) across physical or logical partitions or nodes.
One negative to the configuration of computing systems that include multiple parallel processing nodes distributed over large geographical networks, is that the threads of each task within a job are limited to communicating via the MPI collectives model. Under this model, specific commands are provided that forces each thread to share information via the network to every other thread executing within the job, one thread at a time, and for each thread to receive a result/answer from every other thread executing within the job. Thus, threadA (a) talks with threadB, receives an answer from thread, (b) talks to threadC, receives an answer form threadC, and so on, until threadA receives an answer from threadN, where N is an integer representing the total number of other threads executing within the job. This use of MPI collectives is bandwidth intensive and each message issued by a task incurs a substantially high latency to complete on the network (across the multiple nodes assigned to the job).