1. Field of the Invention
The present invention relates to a communications apparatus, and more particularly, to a communications apparatus which has a function of writing payload data included in a received packet directly into a buffer of an application program.
2. Description of the Related Art
Generally, a computer has a network interface card to make communications with other computers and devices through the network interface card.
A technique called RDMA (remote direct memory access) is utilized particularly in the field of parallel computing where the communication performance significantly affects a system. RDMA is a technique which allows a data recipient to directly transfer payload data in a received packet to a memory space utilized by an application program (hereinafter called the “application”) without the intervention of an OS space managed by an operating system (OS). By thus reducing the number of times transferred data is copied into buffers, it is possible to improve the performance and reduce communication latency, resulting from a reduction in the processing load on a processor. Recently, in particular, iWARP which is a protocol for utilizing RDMA on TCP/IP, which is a standard protocol of the Internet, has been standardized by RDMA Consortium shown in “RDMA Protocol Verbs Specification (Version 1.0),” pp 191-201, and “An RDMA Protocol Specification (Version 1.0),” pp 4-7.
FIG. 1 illustrates an exemplary configuration of an RDMA communications apparatus which is based on RDMA to make communications. Referring to FIG. 1, RDMA communications apparatus 9 comprises network adaptor 1, processor 2, main memory 3, I/O device 4, and system bus 5 which interconnects these components.
Main memory 3 has two memory areas, i.e., application memory area 31 which is a memory area for an application to perform processing, and control information memory area 32 which is a memory area for a communication protocol to manage control information. Network adaptor 1 comprises reception processing unit 8, transmission processing unit 13, and connection management memory 12.
To make an RDMA-based communication between RDMA communications apparatus 9 and another RDMA communications apparatus 9, processor 2 in RDMA communications apparatus 9 on the transmission side sets the delivery for transmit data stored in application memory area 31 in control information memory area 32, and requests network adaptor 1 to make a packet transmission. Transmission processing unit 13 in network adaptor 1 finds the location at which the transmit data is stored from control information memory area 32 to read the transmit data from application memory area 31. Transmission processing unit 13 further reads connection information required to send packets from connection management memory 12, creates transmit packets, and delivers the packets to RDMA communications apparatus 9 on the reception side. Connection management memory 12 updates the connection information as required, and stores the updated connection information in connection management memory 12.
In RDMA communications apparatus 9 on the reception side, a packet received from RDMA communications apparatus 9 on the transmission side is supplied to reception processing unit 8 of network adaptor 1. Reception processing unit 8 identifies a connection from a header section of the received packet to read connection information from connection management memory 12. Reception processing unit 8 further determines from which address a payload section of the received packet should be written into application memory area 31 from information described in the header of the received packet and the connection information, to write the payload section of the received packet into application memory area 31. Reception processing unit 8 updates the connection information as required to update connection management memory 12.
In regard to a connection established between RDMA communications apparatus 9 on the transmission side and RDMA communications apparatus 9 on the reception side, information for identifying application memory area 31 (the identifier, start address, and size of a buffer used as application memory area 31), and the current status of application memory area 31 into which data can be written (for example, a list of normally received sequence numbers) may be held in connection management memory 12, and the start sequence number and size of data included in payload data may be contained in the header section of the received packet, so that the location in which the payload data should be written can be identified from the information described in the header of the received packet, and the connection information. Detailed descriptions on a standard method of implementing RDMA communications can be found in the aforementioned “RDMA Protocol Verbs Specification (Version 1.0),” pp 191-201, and “An RDMA Protocol Specification (Version 1.0),” pp 407, including how the payload write location is specifically identified from the information described in the header of the received packet and the connection information.
FIG. 2 illustrates an exemplary configuration of conventional reception processing unit 8 in RDMA communications apparatus 9. Conventional reception processing unit 8 comprises packet reception processing unit 81, packet error check unit 83, connection information reading unit 82, protocol processing unit 84, connection information write request unit 87, packet transmission request unit 89, payload data write request unit 85, control information write request unit 88, and main memory write processing unit 86.
Upon receipt of a packet from RDMA communications apparatus 9 on the transmission side, packet reception processing unit 81 determines the type of protocol from a header section in the packet, and then requests packet error check unit 83 to check whether or not any error exists in the packet. Packet error check unit 83 checks whether any data error exists in the overall packet or in the payload section of the packet. Specifically, the error check processing refers to an FCS calculation when the Ethernet (registered trademark) is utilized on Layer 2, or a TCP check sum calculation when TCP is utilized on Layer 4. When any error is found in the packet, the packet is discarded and is not subjected to subsequent processing.
When no error is found in the packet after the packet has been tested up to the end thereof, connection information reading unit 82 reads connection information from connection management memory 12 based on a connection identifier for uniquely identifying a connection which can be extracted from the header section in the packet. Upon completion of the reading of the connection information, protocol processing unit 84 performs protocol processing based on the read connection information and the payload information of the received packet. Specifically, the protocol processing involves processing for determining from which address payload data is written into application memory area 31, processing for determining the contents of connection information which should be updated, processing for determining whether or not a response packet must be sent, and generating information on the response packet if it must be sent, and processing for determining whether or not control information must be written into control information memory area 32 and determining contents of the control information if needed. A typical example of the response packet is an ACK (ACKnowledgement) packet of TCP.
At the time protocol processing unit 84 determines a location in which payload data is written, payload data write request unit 85 writes the payload data into application memory area 31 through main memory write processing unit 86. Connection information write request unit 87 writes connection information updated by protocol processing unit 84 into connection management memory 12. Packet transmission request unit 89 requests transmission processing unit 13 to send a response packet when protocol processing unit 84 determines that the response packet must be sent. Control information write request unit 88 writes control information into control information memory area 32 through main memory write processing unit 86 after the payload data has been completely written, when protocol processing unit 84 determines that the control information must be written.
A typical example of control information in the RDMA communication is a read complete notice which is generated when an RDMA read has been controlled. In the RDMA communication, at the time transfer data has been fully transferred from application memory area 31 of RDMA communications apparatus 9 on the transmission side to application memory area 31 of RDMA communications apparatus 9 on the reception side, control information is written to indicate that RDMA read control has been completed, thereby notifying processor 2 of the completion of an RDMA read. The total capacity of transfer data transferred in one RDMA communication is often larger than the payload length of a packet, in which case the RDMA communication is made by dividing the transfer data into a plurality of payloads in a plurality of packets. The control information is written to provide notification that the RDMA read control has been completed after all packets for RDMA transfer have been completely received to complete the transfer of the payload data to application memory area 31.
Next, a conventional reception operation in RDMA communications apparatus 9 will be described with reference to a timing chart of FIG. 3.
Upon receipt of a packet from RDMA communications apparatus 9 on the transmission side, packet reception processing unit 81 receives a header section, and requests packet error check unit 83 for appropriate processing at the time the type of protocol is determined. Packet error check unit 83 checks the overall packet or the overall payload data in the packet to determine whether or not the packet is defective. The packet must have been received up to its end before packet error check unit 83 completes the determination as to whether or not any error exists in the packet. Upon completion of the error check in packet error check unit 83, connection information reading unit 82 reads connection information which is then passed to protocol processing unit 84. At the time protocol processing unit 84 determines a location in which the payload is written based on the header information of the packet and the connection information, the payload data is written in main memory write processing unit 86. Upon completion of other protocol processing in protocol processing unit 84, connection information write request unit 87 and packet transmission request unit 89 execute their respective processing. The control information is written in main memory write processing unit 86 after the completion of both the other protocol processing in protocol processing unit 84 and the writing of the payload data in main memory write processing unit 86.
JP-A-2004-7254 also describes a communications apparatus which has a function of directly writing payload data included in a received packet into a buffer of an application program. However, the communications apparatus described in JP-A-2004-7354 employs a Layer-4 protocol which is defined exclusively for RDMA communications.
On the other hand, another prior art technique related to communications involves switching between a data transfer which is associated with an error check and a data transfer which omits the error check in accordance with a remaining amount of capacity in a reception buffer, as shown in JP-A-1997-149067. A further prior art technique starts writing a packet into a memory without waiting for the completion of an error check of the packet, and upon detection of an error, discards the data that has been written up to this point which includes the error, as shown in JP-A-1998-341419.
As described above, the conventional RDMA communications apparatuses check whether or not any error exists in a received packet, and then write a payload data into an application buffer after they have found that no error exists in the packet. While representative examples of checking whether or not errors exist in a packet are FCS and CRC, packet data must be received up to the end thereof in order to calculate FCS or CRC to check whether or not there is any error in the packet. Generally, a packet size of 1500 bytes is employed in the Ethernet (registered trademark) which is used as standard in the Internet. Accordingly, an RDMA communications apparatus on the reception side cannot write payload data into an application buffer unless it has fully received 1500 bytes of packet data and confirms that there is no error in the packet. Of course, when a communication is made in a larger packet size with the intention of increasing a transfer efficiency, an even longer waiting time is involved to confirm whether or not there are errors in the packet.
As described above, the conventional RDMA communications apparatuses are faced with the challenge of a large delay which is experienced by RDMA communications apparatus 9 on the reception side, from the time that packet is received until the time that the a packet is written of the packet into an application buffer. This is because, as described above, RDMA communications apparatus 9 on the reception side performs an operation for reading connection information, for processing of protocol represented by the determination of a location in which payload data is written, and for processing of writing the payload data into the application buffer after RDMA communications apparatus 9 has received a full packet and confirmed that there is no error in the packet.
To solve such a problem, it is thought that the technique described in JP-A-1998-341419 can be applied to the RDMA communication to start writing a received packet into an application memory without waiting for the completion of an error check for the packet, and upon detection of an error, to discard all data including the error that has been written into application memory up to this point. However, when a communication protocol such as TCP is utilized in Layer 4, an area in which a payload of a re-transmitted packet is to be stored, can partially or entirely overlap with an area in which a payload of a packet received in the past has already been stored. For this reason, if payload data is written before an error check, the payload data can overwrite valid data within application memory area 31, so that a simple discard of an erroneous packet could cause a valid section on application memory area 31 to be passed to an application while it remains corrupted by the error packet. Of course, no problem will arise if Layer 4 protocol, defined exclusively for the RDMA communications, is utilized as found in JP-A-2004-7354 to ensure that a payload of a received packet will be stored in an area which does not overlap with any area which has stored a payload of a packet received in the past. However, since the employment of TCP/IP, utilized as a standard in the Internet, enables a reduction in the price of RDMA communications apparatuses, and enables coexistence with other communication protocols and applications, the implementation of TCP/IP based RDMA communications, i.e., RDMA over TCP/IP, is important.