This invention relates to a data transfer method between plural element processors connected through an interconnecting network and a computer system suitable therefor.
A conventional parallel computer system often has structure that plural element processors each of which includes a local memory and an instruction processor are connected by an interconnecting network. In general, the parallel computer system of this type is called as a parallel computer system of a distributed memory type. Each element processor exchanges data stored in its local memory with other element processors by way of the interconnecting network and executes processing in parallel with each other.
In the parallel computer system of the distributed memory type, data transfer is generally achieved by using the programming model called message passing. In the message passing model, a sending (SEND) procedure and a receiving (RECEIVE) procedure are described explicitly in the parallel program (which is called as a user process hereinafter) executed in each element processor. The element processor of the sending side transfers a message which contains send data designated by a send procedure (SEND) when the procedure has been executed. The element processor of the receiving side receives the message when a receive procedure has been executed. The instruction processor in each element processor analyzes these communication procedures included in the user process under execution, and advances the processing by transferring data to the interconnecting network or receiving data therefrom. The element processor of the sending side designates the number of the destination element processor and transfers the message. Some specific data transfer methods are proposed by which each element processor actually processes the data send request and the data receive request from such a user process. From a practicable viewpoint, it is preferable that the extra processing (send overhead) executed from issuing of the send request by the user process of the sending side till the start of transferring of the user data and the extra processing (receive overhead) executed from issuing of the receive request by the user process of the receiving side till handing over of the data to the user process are small.
Recently the following data transfer method is used in many plural parallel computer systems, in order to reduce the send overhead and the receive overhead. The send/receive circuit of the element processor of the sending side directly read user data to be sent from a region for a user process provided in the local memory, and transfers the user data and generates a message which contains the user data. The send/receive circuit in the element processor of the receiving side directly writes the user data within the message into a region provided for a user process of the receiving side within the local memory. There are some specific methods for achieving such data transfer, but, in the present specification, those data transfer methods are generically called as a direct inter-memory data transfer method.
Specific examples of the communication method for executing the method are the PUT communication method for transferring send data to another element processor and GET communication method for receiving data from another element processor. For instance, refer to xe2x80x9cProceedings of Information Processing Society of Japan Parallel Processing Symposium JSPP ""95xe2x80x9d and PP. 233-240 (May, 1995) or refer to a manual xe2x80x9cHI-UX/MPP Remote DMA User""s Guide -C-xe2x80x9d published by the present assignee for a parallel computer system SR2201 developed thereby. The PUT communication method is called as a direct remote memory access method in this manual.
To be specific, in the PUT communication method, the instruction processor in the element processor of the sending side notifies the send/receive circuit of an address (send data address) of the local memory of the sending side to read the send data, an address (receive data address) of the local memory of the receiving side to write the send data, and some other addresses. The send/receive circuit of the sending side reads the user data from the local memory based on the send data address, generates a message which contains the user data, the receive data address, and other addresses, and transfers the message. The send/receive circuit of the receiving side writes the user data within the message into the local memory in the element processor of the receiving side according to the receive data address in the message. Thus, in the PUT communication method, the send/receive circuit of the sending side directly accesses the local memory of the sending side according to the send data address and the send/receive circuit of the receiving side directly accesses the local memory of the receiving side according to the receive data address. Other communication methods which differ from such a PUT communication method but execute similar processing will be called below as a PUT communication method or a remote memory directly access method. The PUT communication method will be discussed below as a representative method of the direct inter-memory data transfer method, when the latter method is to be discussed.
In direct inter-memory data transfer method, the send/receive circuit which generates or receives a message may directly read the send data from a region for a user process in the local memory of each element processor or writes the received data into a region for a user process in the local memory. So, in the element processor of the sending side, the send data does not need to be copied from a region for a user process onto a buffer area (controlled by the Operating System), and in the element processor of the receiving side, the received data need not be copied from a buffer area (controlled by the Operating System) onto a region for a user process, either. Therefore, in this communication method, the send overhead and the receive overhead which derive from the copy can be reduced.
The communication library which executes the direct inter-memory data transfer method of such as the PUT communication or the GET communication and so on is developed by each parallel computer system maker or a research organization as a library specific to them. So, it is difficult to port the parallel programs programmed so as to use the library to other machines. On the other hand, a library (message passing library) with a standard interface specification to use the message passing model between user programs has come in use. It is possible to run such a parallel program programmed so as to use the library on different computers equipped with the library without change. The representative of that interface specification is MPI (Message Passing Interface). Universities in the United States of America and parallel computer system makers has organized the Message Passing Interface Forum (MPI Forum), and MPI is specification decided by the forum as a result of the research. It is expected that the library produced based on the specification will become a main current of the parallel program development support libraries in the future. Hereinafter, the library will be called as the MPI library.
In order to execute a data send request or a data receive request issued from a user process to the MPI library at high speed, it is effective to execute those request by using the direct inter-memory data transfer method like the PUT communication method and so on. Thus, parallel computer system makers and so on have developed MPI libraries to execute the send request or the receive request from the user process, by using the direct inter-memory data transfer library such as the PUT communication library. For instance, such a MPI library is used in the parallel computer system SR2201 developed by the assignee of the present application. The MPI library of this kind responds to the data send request or the data receive request from the user program, issues a suitable command to the PUT communication library, and requests execution of the transfer or receive operation requested by the user process. The PUT communication library orders transfer of a message to the network interface circuit which includes a message send/receive circuit. The communication method which processes the data send request or the data receive request issued from the user process to a MPI library by using a direct inter-memory data transfer library such as a PUT communication library may be called below as a communication method of using these two libraries or using the two libraries together or using combination of these two libraries. The MPI specification does not defines the specification concerning the interface between the MPI library and the direct inter-memory data transfer library such as the PUT communication library. Therefore, the MPI library to use the direct inter-memory data transfer library such as the PUT communication library together is different depending upon each computer maker. However, the interface between a user program and a MPI library is the same with any MPI library produced by any computer maker. Therefore, the user program can run on a computer by any maker as far as it is installed with a MPI library, and a data send request from the user program will be processed at a high-speed by the data transfer by direct inter-memory data transfer library such as the PUT communication library.
When the user program directly uses the PUT communication library without using the MPI library together, the user program should designate information necessary for an operation of the data send/receive circuit, such as an address of a memory where the user data to be transferred exists and the length of the user data as arguments of a call sentence to the PUT communication library. When the user program uses the MPI library for data transfer, the user program should designate the additional information, in addition to the two arguments designated for the PUT communication library, as arguments of a call sentence to the MPI library. The additional information includes information necessary for processing data send/receive protocol defined by the MPI specification. The additional information includes plural pieces of information of plural kinds predetermined by the MPI library. Specifically, the additional information is data of fixed length and contains the identifier of the destination process and the process group identifier and so on. The additional information is used in the destination element processor of the message to identify whether the message is one requested by a receive request issued by a user process under execution. The additional information may be called below as the MPI additional information.
In the conventional parallel computer system which uses a direct inter-memory data transfer library such as a PUT communication library together with a MPI library, the user data and the MPI additional information designated by a send request issued by a user process were transferred by two different messages. That is, in the element processor of the sending side, the MPI library notifies the PUT communication library of information necessary for transfer of the user data and requests the PUT communication library to generate transfer control information which contains plural pieces of information, based on those notified information. The put communication library generates the transfer control information, writes it into the local memory, and requests the network interface circuit to execute data transfer according to the transfer control information. The network interface circuit read the transfer control information from the local memory, read the send data from the local memory according to the send data address in the transfer control information, and generates and transfers a message which contains the data. In addition, the network interface circuit writes the send completion flag into the local memory according to the send data flag address in the transfer control information. The MPI library, the PUT communication library and the network interface circuit execute the above-mentioned processing to the MPI additional information also. In the element processor of the receiving side, the network interface circuit, the PUT communication library, and the MPI library processes a receive request issued by the user process of the receiving side based on the two transferred messages.
When the user data and the MPI additional information are transferred to the same destination processor in a form of two messages like the prior art, different pieces of transfer control information are needed for transfer of each message. As a result, it is necessary to generate the transfer control information twice. Therefore, the delay time (called transfer latency) from requesting by a user program, transfer of data till start of the transfer is large in the conventional data transfer using a MPI library.
In addition, the send processing and the receive processing are executed to each of these two messages in the processor element of the sending side and in the processor element of the receiving side, respectively. So, the access frequency to the local memory in these processor elements increases in proportion to the number of the messages. For instance, the element processor of the sending side needs to execute the operations such as reading of the transfer control information, reading of data to be transferred, and writing of a send completion flag for each message.
The object of the present invention is to provide a data transfer method which can decrease the above-mentioned problems and transfer data at a higher speed, and a computer system suitable therefor.
A more concrete object of the present invention is to provide a data transfer method which can transfer both user data and additional information which a user process of the sending side designate, and a computer system suitable therefor.
To achieve those objects, in a computer system according to the present invention, a network interface circuit in each element processor is provided with a memory read circuit connected to the memory in the element processor and connected to the memory read circuit which generates a message which should be transferred to the interconnecting network. The memory read circuit reads the first data and the second data both to be transferred from the memory, based on the first and second pieces of address information designated by the processor in the element processor. The message assembly circuit generates a message which comprises a header and send data which includes both the first data and the second data. Therefore, each element processor can transfer plural data at a higher speed than the conventional apparatus.
A data transfer method according to the present invention is achieved by using a network circuit which has the function to generate a message which contains plural data as send data, just like the network circuit mentioned above.
That is, first and second pieces of address information and other information are generated by cooperate between a message passing library and a direct inter-memory data transfer library provided in an element processor of a sending side, in response to a data send request issued by a user process of a sending side running in said element processor of a sending side;
wherein the data send request designates user data and additional information to be transferred, and data length information for the user data;
wherein the first and second pieces of address information designate two storage regions within the memory of the element processor of a sending side where said user data and said additional information are stored;
wherein the additional information comprises a plurality of pieces of information each of which is of one of a plurality of kinds predetermined by the message passing library of a sending side and relates to transferring of the user data to a user process of a receiving side running in the element processor of a receiving side;
wherein the other information comprises a plurality of pieces of information each of which is of one of a plurality of kinds predetermined by the direct inter-memory data transfer library and relates to transferring of the user data and the additional information to the element processor of a receiving side.
The network interface circuit reads the user data and the additional information from the memory of the element processor of a sending side, based on the first and second pieces of address information.
The network interface circuit further assembles a message send data of which includes the user data and the additional information, and a header of which includes the other information and the data length information.
The network interface circuit transfers the message to the element processor of a receiving side by way of the interconnecting network.
In a more specific embodiment of the present invention, the user process not only can use the above mentioned data transfer which uses both the message passing library and the message passing library, but also can issue a data send request directly to the direct inter-memory data transfer library. In the latter case, the direct inter-memory data transfer library and the network interface circuit execute processing basically the same to the above-mentioned processing except that the additional information does not exist.