Some modern computer programs are capable of executing as one or more processes distributed across one or more communicatively linked nodes. A “node” refers to a physical data processing system such as a computer. The nodes are communicatively linked by a network. Each process has an independent virtual memory address space called a partition. The collection of all partitions for the computer program is called the “Partitioned Global Address Space” (PGAS). The computer program can be referred to as a PGAS program. Each partition typically stores different data.
From time to time, one process may require information from another process. The process in need of information, called the requesting process, issues a read request using a network transfer application programming interface (API) to copy or read bytes from the partition of another process called the target process. The requested bytes are sent to the requesting process in a response message and, once received, are copied into the partition of the requesting process. A similar series of events occurs to write bytes from the partition of one process to the partition of another process.
Network read and write operations are also called get and put operations. Network read and write operations are several orders of magnitude slower than locally performed read and write operations. A locally performed read or write operation is one performed by a process using the partition belonging to that process, e.g., using local memory. For this reason, PGAS programs with performance requirements are designed to minimize network operations in favor of local memory operations.
Often, a PGAS program distributes data structures across all partitions so that each process is able to work on local data. When necessary, data is exchanged between processes using network READ and WRITE operations. One issue with this approach is the necessity of keeping track of the location of data within each partition. One solution for keeping track of data is to maintain, for each distributed data structure, a list of virtual base addresses pointing to the memory block used in that partition. One such list is maintained for each partition. As an example, consider a system with 128 processes per node with approximately 2^20 total processes. On a 64-bit machine, this would take at least: 8×128×2^20=2^30 bytes or 1 Gb per node per distributed data structure. This is both expensive in terms of storage as well as time to initialize the list of virtual memory addresses.
One approach for reducing the amount of memory needed for tracking the location of data structures has been to utilize a single handle per data structure. The handle is identical on all partitions with access to the data structure. Within each partition, a table or shared variable directory (SVD), maps the handle to the allocation for that handle on the same partition. Accordingly, the same 128 processes per node over 2^20 processes on a 64-bit can be stored with only 8×128 bytes or 1 kilobyte per node per distributed data structure.
To utilize such an approach, address lookups for network read and/or write operations must be translated from the handle to a virtual base address. The translation is performed by the target process. Performing the translation, however, interrupts the target process from performing its assigned task(s) thereby reducing performance. In some systems, network hardware is able to accelerate read and write operations when the virtual memory address in the target partition is known by the requesting process. This acceleration is referred to as remote direct memory access (RDMA). In general, RDMA refers to the ability of one data processing system to directly access memory of another data processing system without involving the operating system of either data processing system. One of the benefits of RDMA is that the target process is not interrupted to perform address translation. RDMA requests, however, cannot be made in a PGAS implementation that uses SVD in the general case because the remote virtual memory address for the distributed data structure in the target process being accessed is not known by the requesting process. The target process is asked to perform the translation, thereby rendering RDMA unavailable.