1. Field of the Invention
The present invention relates to a data transfer method for transferring data through a network in a highly reliable manner and at a high speed and to a disk control unit (disk controller) for controlling a plurality of disk drives.
2. Description of Related Art
With enhancement of information communications infrastructures, there is a need for even higher processing capabilities of information communications systems. As more miniaturized LSI integrated circuits have lately been evolved, in the current situation, system performance depends on the capability of data transfer between LSI devices. Therefore, study efforts are underway to enhance the IO system capability and augment its functions vigorously. Higher transfer speed has been achieved and communication protocol engines with a variety of transport functions have been developed.
For example, InfiniBand transfer modes defined in InfiniBand Architecture Specification Release 1.0a provide a data transfer method wherein an interface between an application process and an IO system is supported by a queue pair (QP) consisting of a send queue (SQ) and receive queue (RQ) and a completion queue (CQ) to which a completion status is posted when processing of a request placed in the queue pair is completed. These queues are explained, using FIG. 4.
A process 51 and a process 52 communicate with each other, using two queue pairs for each process. The process 51 has a queue pair 41 consisting of a send queue 11 and a receive queue 21 and a queue pair 42 consisting of a send queue 12 and receive queue 22. Likewise, the process has a queue pair 43 consisting of a send queue 13 and a receive queue 23 and a queue pair 44 consisting of a send queue 14 and receive queue 24. In a completion queue 31, a completion status for the queue pair 41 and queue pair 42 is stored. In a completion queue 32, a completion status for the queue pair 43 and queue pair 44 is stored.
As an entry to the send queue, a transfer request is placed. A data unit to be transferred by this transfer request is referred to as a logical record. As an entry to the receive queue, a pointer to a receive buffer is stored. A transfer request placed in the send queue 12 has a pointer to a record buffer 81 within a process buffer 71 and a transfer request place in the send queue 14 has a pointer to a record buffer 82 within a process buffer 72. Likewise, in the receive queues 22 and 24, respectively, pointers to the record buffers 81 and 82 are stored.
Between two queue pairs that communicate with each other, send queue to receive queue connections are set up. The send queue 12 connects to the receive queue 24 and the send queue 14 connects to the receive queue 22. Then, when a transfer request placed in the send queue 12 is processed, a logical record stored in the record buffer 81 is transferred to the record buffer 82 specified by the receive queue 24. Upon the completion of fault-free transfer of the record, a completion status is posted from the receive queue 24 to the completion queue and a completion status is posted from the send queue 12 to the completion queue 31.
Control operation of these queue pairs and completion queues is performed by hardware called a host channel adapter (HCA). An example of the HCA configuration is shown in FIG. 5. The HCA comprises a receiving port 613, a transmitting port 623, a receiving link layer logic 631, a transport layer logic 642, a processor 633, a transmitting link layer logic 641, a transport layer logic 642, a processor 643, a memory 650, and a connection interface 660. The HCA communicates with an application process via the connection interface and memory. The receiving side and the transmitting side can operate in parallel and the processors and link layer and transport layer logics enable high-speed execution of high-functionality protocol processing.
How a single record transfer request is process between two HCAs is explained, using FIG. 6, for an instance where a record in a record buffer 84 within a process buffer 73 of the HCA1 side is transferred to a record buffer 84 within a process buffer 74 of the HCA2 side. At the HCA 1, the record in the record buffer 83 is disassembled into data of size suitable for transmission and an appropriate header and an error check code (CRC) are attached to each data. Packets 401–403 each containing data with a header and CRC are transferred. At the HCA2, received packets are checked for whether an error occurs by the CRC. If an error is detected, the HCA2 notifies the HCA1 of the error by returning a NAK (Negative AcKnowledgement). The HCA1 retries to transfer a packet for which the NAK has been returned. When the HCA2 has received all packets correctly, the HCA2 reassembles the received data into the logical record and stores the record into the record buffer 84. The HCA2 posts a completion status to a completion queue 34 and notifies the HCA1 that reception of the record is complete. When the HCA1 is notified that the reception is complete from the HCA2, the HCA1 posts a transfer completion status to a completion queue 33 and, at this point of time, the sequence of transfer request processing terminates.
Another example of how an RDMA transfer request is processed between two HCAs is explained, using FIG. 7. In this example of RDMA transfer, data in an area of source of RDMA transfer 85 within application memory space 75 of an initiator is transferred to an area of destination of RDMA transfer 86 within application memory space 76 of a target. In the case of RDMA transfer, because data is directly transferred into destination application memory space, destination memory address information must be attached to data. Except that, operation is the same as for a single record transfer request. At the HCA1, the data in the area of source of RDMA transfer 85 is divided into suitable size and placed in packets and the HCA1 transfers the packets serially to the HCA2. The HCA2 stores each data from the received packets into a designated location within the area of destination of RDMA transfer 86. If necessary, packet transfer is retried, and each packet data is reassembled into complete data in the area. When the HCA2 has received all packets correctly, the HCA2 posts a completion status to a completion queue 36 and notifies the HCA1 that reception of the data is complete. When the HCA1 is notified that the reception is complete from the HCA2, the HCA1 posts a transfer completion status to a completion queue 35 and, at this point of time, the sequence of transfer request processing terminates.
The protocol discussed hereinbefore is a reliable data transfer method which ensures that transferred data,is free of errors and this method is fundamental and commonly used for a wide rage of application. Two essential characteristics of the traditional method of reliable data transfer are:    1. The target posts the completion status after making sure that a whole logical record, which is a unit of transfer request, is free of errors.    2. The initiator starts a transfer of the next logical record after confirming the notification of the fault-free transfer completion status of the whole logical record from the target.
The above characteristics are explained, using FIGS. 2 and 3. In FIG. 2, an application (AP) 1 of the HCA1 side starts a transfer of a logical record 221 by issuing a transfer request 121. If the HCA2 detects a transfer error, the transfer is retried. When the HCA2 has received the logical record 221 correctly, the HCA2 posts a completion status 321 to a completion queue of an application 2. Upon receiving the completion status 321, the application 2 can start a process 721 using the logical record 221. The HCA2 that has received the logical record 221 correctly notifies the HCA1 of the reception completion and the HCA1 posts a transfer completion status 361 to a completion queue of the application 1. As is obvious in this example, the target-side application 2 receives the completion status 321 after the whole logical record 221 has been received completely. On the other hand, the initiator-side application 1 can initiate a transfer request of the next logical record after knowing that the HCA2 has received the whole logical record 221 completely.
In FIG. 3, the application 1 of the HCA1 side starts an RDMA transfer by issuing a transfer request 131. In this case, data in the area of source of RDMA is regarded as one logical record. The HCA2 receives a plurality of packets of divided data and issues a retry request when necessary. Upon the completion of fault-free transfers of all packets (transfer of the whole logical record), the HCA2 posts a completion status 331 to a completion queue of the application 2 of the HCA2 side. Upon receiving the completion status 331, the application can start a process 731 using the data in the area of destination of RDMA transfer, that is, the transferred logical record. The HCA2 that has received the logical record correctly notifies the HCA1 of the reception completion and the HCA1 posts a transfer completion status 371 to the application 1 (its completion queue).
As is obvious in this example also, the target-side application 2 receives the completion status 331 after the whole logical record (the data in the area of destination of RDMA transfer) has been received completely. On the other hand, the initiator-side application 1 can initiate a transfer request of the next logical record after knowing that the HCA2 has received the whole logical record (the data in the area of destination of RDMA transfer) completely.
In this way, the traditional method of reliable data transfer was required to have the above two essential characteristics as a mechanism for avoiding transfer errors. Another example of the traditional method of reliable data transfer disclosed in Japanese Published Unexamined Patent Application No. Hei 8-179999. As this example states, a method that assures the integrity of data transferred before an error occurring is known, but such a method is still required to fulfill the above two characteristics.
[Japanese Patent Document Cited 1]
    Japanese Published Unexamined Patent Application No. Hei 8-179999.[Non-Patent Document Cited 1]    InfiniBand Architecture Specification Release 1.0a