Not applicable
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The invention disclosed broadly relates to the field of high speed computers, and more particularly relates to the transfer of noncontiguous data blocks during a one-side communications between two or more computational nodes in distributed parallel computing machines.
2. Description of the Related Art
The introduction of highly parallel distributed multiprocessor systems such as the IBM RISC System/6000 Scalable POWERparallel (SP) systems provide high reliability and availability. These systems in their simplest form can be viewed as a plurality of uniprocessor and multiprocessor computer systems coupled together to function as one coordinated system through a local area network (LAN).
Data transfer between nodes of highly parallel distributed multiprocessor systems is necessary to enable truly scalable and efficient computing. Data transfer between nodes is broadly divided into two groups, contiguous and noncontiguous data transfer. Contiguous data that is stored in adjacent locations in a computer memory. In contrast, noncontiguous data is data that is not stored is collection of adjacent locations in a computer memory device. It is well known that the transfer of noncontiguous data requires more pipeline and supporting processor overhead than the transfer of contiguous data. The transfer of noncontiguous data block is also referred to as a transfer of I/O vectors.
Typically, there are two types of I/O vectors (i) general I/O vectors where each data block (or vector) can be a different length and (ii) strided I/O vectors where each data block (or vector) is a uniform length. Referring now to FIG. 1, show is the general I/O vector transfer. Shown are four data blocks 100 in strided I/O vector 110. It is important to note that the starting addresses of the data blocks may not be symmetrically spaced as shown. Each of the four data blocks has a starting address a0, a1, a2, a3 and a length 10, 11, 12, 13. The transfer of an I/O vector 110 with four data blocks 100 from an origin task 106 to a target task 108.
Turning now to FIG. 2 there is shown a block diagram of a strided I/O vector transfer. There are three data blocks 200 (or vector) are shown. Notice that the length or block size 204 of each data block 200 is uniform. Moreover, the stride size 202 or the distance in bytes between the beginning of one block (or vector) and the beginning of the next block (or vector) is uniform. The transfer of an I/O vector 210 with data blocks 200 from a source or origin task 206 to a target task 208 with the same block size and stride size is represented. In the general vector transfer, a number, N, of vectors on the source are transferred to a corresponding number of vectors on the target, in this example 3, where the length 204 of each vector transferred is the same as the length of the corresponding vector on the target task 208. During a strided I/O vector transfer the following parameters are specified, the block size, the stride size, the number of vectors or blocks and the starting addresses of the first block on the source and the target.
The teaching of a centralized multiprocessor system, such as the system disclosed in the U.S. Pat. No. 5,640,534 issued on Jun. 18, 1997, assigned to Cray Research, with name inventors Douglas R. Beard et al. for a xe2x80x9cMethod and Apparatus for Chaining Vector Instructions,xe2x80x9d does not address the problem with vector transfer on highly parallel distributed multiprocessor systems, such as the IBM SP. More specifically the teachings of the centralized multiprocessor systems do not address the problem on highly parallel distributed multiprocessor systems of the transfer of vector data during a one-side communications between two or more computational nodes (where each node itself can comprise two or more processors). A one-sided communications is a communications where the receiver is not expecting or waiting to receive vector data communications. This data transfer is not efficient and a need exists for optimized noncontiguous data transfer on distributed multiprocessor machines like the IBM SP. These systems allows users to write application programs that run on one or more processing node to transfer vector data in a one-sided communications style. These applications programs make use of a library of APIs (Application Programming Interfaces). An API is a functional interface that allows an application program written in a high level program such as C/C++ or Fortran to use these specified data transfer functions of I/O vectors without understanding the underlying detail. Therefore a need exists for a method and a system to provide I/O vector data transfer during a one-sided communications in a highly parallel distributed multiprocessor system.
If noncontiguous I/O vector data transfer capability is not available on a distributed multiprocessor machines an application requiring noncontiguous I/O vector data transfer incurs one of two overheads: (I) pipeling and (ii) copying. To transfer non-contiguous data, user in the application program must issue of series of API data transfers. However, the use of successive API data transfer results in LAN pipelining overhead. Alternatively, the application program can be designed to copy all the noncontiguous vector data into a contiguous data buffer before initiating a data transfer. This approach results in copy overheads. Those skilled in the art would know that for efficient noncontiguous data transfer the pipeline costs and the copy costs both must be avoided. An efficient trade-off is needed between the reduction of the number of data packets that are transferred over the network and a reduction of the copy overhead is required. Accordingly, a need exists to overcome these problems by providing an efficient transfer noncontiguous data during one-sided communications.
Still, another problem with noncontiguous data transfer during a one-side communications in a highly parallel distributed multiprocessor system is that efficient packaging of noncontiguous data into fixed packet sizes must be addressed. The packaging of noncontiguous data reduces the number of data packets that must be sent across the network. Typically, minimum state information of the I/O vector data should be maintained during the node-to-node transfer over the LAN. A spillover state is created during the packing of data into packets when the data not fitting into a predefined packet size is placed into spillover state. The creation and maintenance of a spillover state when packing data into packets is inefficient and should be avoided. Therefore a need exists for a method and apparatus to provide efficient noncontiguous data transfer in a one-sided communications while maintaining minimum state information without producing a spillover state. The spillover state becomes especially difficult to handle if the packet with spillover data is to be re-transmitted.
Still, another problem with noncontiguous data transfer during a one-side communications in a highly parallel distributed multiprocessor system is that a request to transfer data from a target node to a source node, in a get operation, must include a description of the source data layout to the target. The description of the source data layout is the list of address and length of data for each vector and the number vectors in the transmission. This need to send a description of the layout of source data to a target process includes control information that needs to be sent to the target and back to the source. Accordingly, a need exists to transfer noncontiguous data while avoiding the sending of a description of the source data layout to the target.
Yet, still another problem with noncontiguous data transfer during a one-side communications in a highly parallel distributed multiprocessor system is that any method to reduce the inefficiencies of data vector into data packets must not be too time-consuming so as to offset any saving in time due to the possible reduction in the number of packets sent. Accordingly, a need exists for a method and apparatus that provides noncontiguous data transfer during a one-side communications that is less costly than the savings in time in reducing the number of packets sent.
Briefly, in accordance with the present invention, a method for grouping I/O vectors to be transferred across a distributed computing environment comprising a plurality of processing nodes coupled together over a network. The method reduces the total number of packets transmitted over a network between two nodes. The method includes the grouping of two or more I/O vectors into a single message, consisting of one packet with a predetermined maximum size, provided the sum of the sizes of the vectors are small enough to be placed into a single packet. The grouping method finds an efficient collection of vectors to form groups that fit inside a single packet. If two or more of the vectors can be combined so that the resulting single packet size does not exceed the predetermined maximum size, the vectors are grouped accordingly. Vectors whose size is greater than the predetermined maximum packet size are sent as a separate message. This results in a method to efficiently transfer strided vectors such that the total number of packets to be sent is minimized while ensuring that the amount of state information that needs to maintained is the same.
In accordance with another embodiment of the present invention, a computer readable medium is disclosed corresponding to the above method.