1. Field of the Invention
The present invention relates to a communication device, method, and program for transferring data at a high speed by an RDMA (Remote Direct Memory Access) method, and particularly relates to a communication device, method, and program for speculatively transferring an RDMA packet without inquiring permission of the reception destination.
2. Description of the Related Arts
Conventionally, in a massive PC cluster or a parallel distributed server system, in order to improve performance of connection between nodes or between storage/host, an RDMA method of a read/write (Read/Write) type communication has been employed in parts where high communication performance is required instead of a send/receive (send/receive) type communication typified by socket communication. In the send/receive type communication, communication is performed when a transfer source node specifies a memory area of the transfer source, and a transfer destination node specifies a memory area of the transfer destination; meanwhile, in the RDMA method, since a transmitting side specifies memory areas of both the transfer source and the transfer destination, intermediate buffers which are needed upon transfer can be reduced, and copy processing generated between the intermediate buffers and the memory areas of the transfer source and the transfer destination can be reduced. Particularly, when a high-speed network of 1 gigabyte/second (1 GB/s) class such as InfiniBand (InfiniBand) or a 10-gigabit Ethernet (R) (10 Gb Ethernet) is used, communication performance can be significantly improved because of reduction of copy processing. In RDMA communication, a transmitting side specifies memory areas of both a transfer source and a transfer destination and performs a transmitting process. In this course, the node with which communication is to be performed is not notified of initiation and termination of the transfer. Therefore, in practice, when RDMA communication is to be transmitted, a communication process for obtaining permission to initiate transfer to the memory area of the transfer destination has to be performed beforehand with respect to the node of the transfer destination with which communication is to be performed. A data transfer procedure of conventional RDMA communication is as the following.
(1) Transmitting Side
When RDMA transmission is enabled, a transmitting side inquires a receiving side about presence of reception permission. After receiving the reception permission, the RDMA data is transferred. At the end, when ack (transfer completion notification) is received, the transfer is completed.
(2) Receiving Side
At the point when reception is enabled, the receiving side transfers the reception permission with respect to the inquiry of the presence of the reception permission. After this, when reception of the RDMA data is completed, ack is returned to the transmitting side.
However, such conventional RDMA communication has a problem that, since the communication process for obtaining the transfer permission for the transfer destination is required before starting the RDMA communication, data transfer accordingly takes time. This communication process for obtaining the transfer permission for the transfer destination is a large problem particularly in a network having high throughput and high latency. For example, when an NIC (network interface card) used in recent 10-gigabit Ethernet (Note that Ethernet is a registered trade name. Hereinafter, this will be omitted) is mounted, although throughput is 1 gigabyte/second or more, latency is about 10 to 20 μs which is large. In such a case, when short data such as 1 kilobyte data is supposed to be subjected to RDMA transfer, the time taken for the data transfer is 1 μs or less for the RDMA transfer, however, since the time taken for the communication process for obtaining the transfer permission which is necessary beforehand is 10 to 20 μs, the latency creates a bottleneck, thereby causing a problem that the throughput cannot be fully exerted.