The present invention is generally directed to memory sharing in distributed data processing systems including a plurality of processing nodes. More particularly the present invention is directed to the improvement of intra-nodal communications in a manner which avoids unnecessary data copying and which also provides address space extension, when and where needed, in a manner which is transparent to users in both 32 bit environments (where there are typically a very small number of segments available to users) and in 64 bit environments.
For parallel applications executing on a distributed memory machine like the RS/6000 SP (IBM pSeries machines), tasks running as part of a parallel or distributed application communicate using some form of reliable message transport such as the publicly defined Message Passing Interface (MPI) or the Low Level Application Programming Interface (LAPI). The tasks of an application can be distributed across various nodes (where a node is defined as a single operating system image) of the system. However, in certain cases some or all of the tasks may reside on the same node. The placement of the tasks of a parallel application is usually abstracted from (that is, specified through) the application communication transport interface (for e.g. on the IBM SP systems this is accomplished via the LL (Loadleveler) and POE (Parallel Operating Environment) products). The underlying reliable message transport (like LAPI or MPI) detects whether or not the task to which communication is requested is running on the same node, in which case it switches to an internal, shared memory transport modality called intra-node communications. In the case of LAPI, the original task (or the task initiating the communication operation) and the target task (the task which is the target of a communication issued by the origin task) are either on the same node or on a different node; in the latter case, messages are sent across the network. This is referred to as inter-node communication. This mechanism improves overall system performance in two fundamental ways: (1) it increases inter-node communication performance since network congestion related to intra-node communications is reduced; and (2) it increases intra-node communication performance by avoiding having to stage data through the network and it takes advantage of operating system hooks to avoid having to stage data incurring extra copies of the data. The present invention provides a mechanism for improving intra-node communications, particularly as implemented in the LAPI (but not limited to LAPI) environment (an efficient one-sided programming model) within the intra-node environment (which is a shared memory environment) The basic concepts of LAPI for inter-node communication are described more particularly in U.S. Pat. No. 6,038,604 and in U.S. Pat. No. 6,035,335.