1. Field of the Invention
The present invention relates broadly to a distributed computing environment (DCE), and particularly to a computer having a remote procedure call (RPC) mechanism or an object request broker (ORB) mechanism. More specifically, it relates to a computer for use in a system which has a system area network (SAN) as the physical communication channel between computers, and in which the data in a storage area of a physical memory of the transmission side computer can be directly transferred to a storage area of a physical memory of the reception side computer by utilizing a remote direct memory access (RDMA) mechanism included in the SAN.
2. Description of the Related Art
The ORB or RPC is a mechanism for invoking methods and functions among computers, namely, nodes in a distributed computing environment. The ORB is employed in, for example, one of the standards of those communication channels among distributed objects which are collectively called xe2x80x9cCORBA (common object request broker architecture)xe2x80x9d. The standard is a business standard established by the Object Management Group (OMG), and is extensively adopted by various vendors such as Sun Microsystems Inc., International Business Machines Corp., Digital Equipment Corp., and Netscape Communications Corp.
The ORB acts between a client and an object in such a manner that the request of the client is conveyed to the object so as to execute an operation, and that, if necessary, the object sends any result back to the client.
FIG. 1 is a diagram showing a prior-art example of the scheme of the ORB or RPC. In the prior art, nodes which use the ORB or RPC are connected to a packet type communication network. With the network, a request for remote invocation is sent after being divided or disassembled into packets which conform to, for example, the UDP (user datagram protocol) or the TCP/IP (transmission control protocol/internet protocol) being the standard protocol(s) of the Internet. For this reason, processing for the division into the packets and the restructuring or reassembling of the request and processing for hardware interruption are executed by a TCP/IP processing unit which is included in an operating system (OS). In a case where the RPC employs the UDP, similar processing items are executed by an RPC library. A network interface card (NIC) and a switching mechanism can be simplified by dividing the request into the packets as stated above. Accordingly, the ORB and RPC can be favorably utilized in networks which range from a LAN (local area network) to a WAN (wide area network) extensively.
With the ORB or RPC, data to be sent out to the network are converted into a standard data representation format, for example, the XDR (external data representation) format of the SunRPC or the CDR (common data representation) format of the CORBA in order that functions can be invoked even among the nodes of different internal data representation formats and among different languages.
In recent years, hardware called the xe2x80x9csystem area network (SAN)xe2x80x9d has come into use instead of the network employing the packets. The SAN has the feature that the node of a transmission side can write data directly into the physical memory of the destination node, and the feature that the reliable transmission and reception of data are guaranteed in hardware processing. A SAN program transmits data by storing the data to-be-transmitted in physical main storage, and giving the network interface card (NIC) a transmission start instruction which designates the location of the transmission data and the location of a reception buffer existing on the physical main storage of the receiving node. Accordingly, high-speed processing is realized.
With the SAN, the reliabilities of fiber and wire being network media are enhanced by imposing geographical restrictions, for example, the maximum transmission path length and xe2x80x9cone floorxe2x80x9d, on a system area, and the NIC is made intelligent, whereby high-speed data transfer is realized. In the data transfer, the NIC operates as a DMA (direct memory access) controller. More specifically, the CPU (central processing unit) of the transmission node issues a data transfer instruction to the NIC by designating the addresses of the transfer source and destination, whereby a remote direct memory access (RDMA) operation is executed. The point of difference of the RDMA from the ordinary DMA is that the identifier of the node is contained in the address. If the identifier of the node indicates the particular transmission node itself, the action of the DMA going out of the particular node and coming back to the same is performed, and the NIC operates simply as a DMA controller.
Heretofore, the SAN has been used as a message passing interface (MPI), the data stream (one pipe through which data flows) model of a parallel virtual machine (PVM) or the like, or a shared memory. Such an element is chiefly employed in the field of scientific and technological computations.
Here will be explained a prior-art example of remote call processing in the CORBA, and so forth.
FIGS. 2 and 3 are diagrams for explaining the flow of the remote call processing. Referring to FIG. 2, on a transmission side, a proxy function is invoked by a program. A transmission side proxy creates the header and body of a request and converts them into the CDR format, and it sends the request from a socket to a TCP/IP stack. This stack divides the message of the request into packets, and sends out the packets to a network.
On a reception side, the packets are restructured or reassembled into the message by a TCP/IP stack. A reception side skeleton specifies an object to-be-invoked and inverts the CDR format into arguments. Besides, it searches for a function to-be-invoked. Further, it invokes a thread for executing the function and delivers the values of the arguments necessary for the execution of the function, to the thread.
Referring to FIG. 3, the function (arithmetic operation) is executed by a reception side program. In the presence of a reply message to the request transmission side, for example, the result of the execution of the function, the reception side skeleton creates the header and body of the reply and sends the message from a socket to the reception side TCP/IP stack, which divides the message into packets and sends out the packets to the network.
In the transmission side TCP/IP stack, the packets are restructured into the message. Subsequently, a transmission-side reply allotting thread specifies a standing-by thread and activates the thread. The transmission side proxy inverts the CDR format into arguments, and the transmission side program invokes the result by the use of the arguments.
FIG. 4 is a diagram for explaining a prior-art example of a method for acquiring a CDR area. The structure of a CDR management area and the pseudo-code of the CDR are illustrated in the figure.
FIG. 5 is a diagram showing the structure of a request message. A xe2x80x9cbig-endianxe2x80x9d and a xe2x80x9clittle-endianxe2x80x9d for an offset value of 6 (six) as indicated in the figure will be explained later.
FIG. 6 is a diagram showing the structure of an object key. The name of a host in which an object exists, a TCP port number which accepts a service, etc. are illustrated in the figure.
As explained above, a computer having the ORB mechanism or RPC mechanism employs the packet mode in the communications between the nodes, and it has therefore involved the problem that a long time is expended on the processing for the division of the data into the packets and the restructuring of the packets into the data. Further, a hardware interrupt is required on the reception side at the arrival of every packet, so that the packet division/restructuring processing, the interrupt processing, the processing for conversion into a standard data format, etc. occupy a larger proportion in the overall delay of the data transfer, as the transmission speed of the network further increases. This has led to the problem that, even when only the network media are contrived so as to increase the data transfer speeds thereof, the increased speeds cannot be fully exploited for the communications. By way of example, a data transfer speed of only 300 Mbps or so can be realized in the Ethernet which has a transfer speed in the order of Gbps.
Moreover, the ORB or RPC is structurally premised on using the TCP or UDP in a lower layer underlying it. Therefore, even when the SAN having appeared in place of the packet switched network is employed, it is used only as the packet transfer network or the data stream model. This has led to the problem that the features of the SAN mentioned before cannot be utilized.
In consideration of the technical background stated above, the present invention has for its object to shorten a delay which is expended on the data transfer between computers each having an ORB or RPC mechanism, in such a way that the remote direct memory access (RDMA) inherent in the SAN is utilized for the data transfer.
According to one aspect of performance of the present invention, a computer having a remote procedure call (RPC) mechanism or an object request broker (ORB) mechanism in a distributed computing environment is constructed comprising a physical memory, a data readout unit which reads out data stored in the physical memory, and a remote direct memory access unit which transfers the data read out by the data readout unit, directly to a physical memory included in a communicating opposite computer connected to the particular computer itself through a network.
In this aspect of performance, the remote direct memory access unit acts between the computers each having the RPC mechanism or ORB mechanism, so as to perform a remote direct memory access (RDMA) operation, that is, an operation in which the data read out of the physical memory of the particular computer itself is directly transferred to the physical memory of the communicating opposite computer without the intervention of CPUs (central processing units) or the main arithmetic units of the respective computers. It is therefore possible to shorten a delay which is expended on the data transfer between the computers.
Heretofore, the CDR has been employed as a common data format in, for example, the CORBA, and data has been infallibly converted into the common data format before the transmission thereof. In contrast, according to another aspect of performance of the present invention, in a case where the formats of data representation in the computers on a transmission side and a reception side are the same, it is possible to omit the conversion into the common data format and to transfer the data left in the data representation format of the transmission side computer.
In this regard, each computer, for example, can further comprise a data-representation-format management unit which manages the data representation formats of the individual computers connected to the network. Thus, the conversion between the data representation formats can be omitted in correspondence with the contents of the management unit.
According to a further aspect of performance of the present invention, in a case where the data representation formats of the particular computer itself and the opposite computer are different, it is possible to convert the transfer data into the data representation formats of the opposite computer on the transmission side and to transfer the resulting data. Also herein, the data-representation-format management unit mentioned above can be further comprised so as to transfer the data subjected to the representation format conversion in accordance with the managed contents.
According to yet another aspect of performance of the present invention, each computer can further comprise a data format conversion unit by which, when the data representation formats of the particular computer itself and the opposite computer are different, the particular computer converts data transferred from the opposite computer, into the data representation format of its own and then stores the resulting data in the physical memory area of its own. In this regard, it is also possible that the remote direct memory access unit of the transmission side transfers a message containing the transfer data, by affixing the data representation format of the particular computer thereto beforehand, while the data format conversion unit of the reception side converts the data to-be-transferred into the affixed data representation format.
According to a further aspect of performance of the present invention, the computer can also comprise a data-representation-format notification unit by which, in starting the communication connection between the computers, the data representation format of the particular computer itself is notified to the opposite computer.
According to a still further aspect of performance of the present invention, the computer can also transfer data toward the opposite computer in a state where an area designated by, for example, arguments for a function is wired down on the physical memory without being freed, in order that the data on the physical memory area of the particular computer itself may be prevented from being saved in the secondary storage area thereof for virtual storage during the data transfer.
According to a different aspect of performance of the present invention, the computer can further comprise a data delivery unit by which a storage area for data transferred from the remote direct memory access unit of the opposite computer and stored in a previously-wired-down physical memory area, the data storage area being left intact, is delivered to a function/method that is to be executed by the particular computer itself. Herein, the computer can further comprise a memory-wire-down release unit by which the wire-down of the storage area for the data on the physical memory area, the transfer of the data from the opposite computer having been completed, is released in advance of the delivery of the received function/method to the data delivery unit.
According to a still different aspect of performance of the present invention, the computer can further comprise an interrupt processing unit which generates a hardware interrupt upon the arrival of a message containing transfer data and sent from the opposite computer, and which itself executes the processing of a function/method that is processed using the transfer data, on condition that the load of the processing of the function/method is light.
As described above, according to the present invention, the data in the physical memory area of the particular computer itself can be directly transferred toward the physical memory area of the communicating opposite computer through the network which is, for example, a system area network.