1. Technical Field
The present invention generally relates to communication protocols between a host computer and an input/output (I/O) device. More specifically, the present invention provides a method by which data that has taken different paths through a network and, as a result arrives in a different order than which it was sent, can be efficiently reordered at the destination.
2. Description of Related Art
In an Internet Protocol (IP) Network, the software provides a message passing mechanism that can be used to communicate with Input/Output devices, general purpose computers (host), and special purpose computers. The message passing mechanism consists of a transport protocol, an upper level protocol, and an application programming interface. The key standard transport protocols used on IP networks today are the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). TCP provides a reliable service and UDP provides an unreliable service. In the future the Stream Control Transmission Protocol (SCTP) will also be used to provide a reliable service. Processes executing on devices or computers access the IP network through Upper Level Protocols, such as Sockets, iSCSI, and Direct Access File System (DAFS).
Unfortunately the TCP/IP software consumes a considerable amount of processor and memory resources. This problem has been covered extensively in the literature (see J. Kay, J. Pasquale, xe2x80x9cProfiling and reducing processing overheads in TCP/IPxe2x80x9d, IEEE/ACM Transactions on Networking, Vol. 4, No. 6, pp. 817-828, Dec. 1996; and D. D. Clark, V. Jacobson, J. Romkey, H. Salwen, xe2x80x9cAn analysis of TCP processing overheadxe2x80x9d, IEEE Communications Magazine, volume: 27, Issue: 6, June 1989, pp. 23-29). In the future the network stack will continue to consume excessive resources for several reasons, including: increased use of networking by applications; use of network security protocols; and the underlying fabric bandwidths are increasing at a higher rate than microprocessor and memory bandwidths. To address this problem the industry is offloading the network stack processing to an IP Suite Offload Engine (IPSOE).
There are two offload approaches being taken in the industry. The first approach uses the existing TCP/IP network stack, without adding any additional protocols. To remove the need for copies, the industry is pursuing two different approaches. One approach consists of adding Framing, Direct Data Placement (DDP), and Remote Direct Memory Access (RDMA) over the TCP and SCTP protocols. The IP Suite Offload Engine (IPSOE) required to support these two approaches is similar, the key difference being that in the second approach the hardware must support the additional protocols.
A second approach involves providing a significant amount of buffering on the network adapter, and having the hardware provide the copy of data received on the network as opposed to copying the data using software. At least for multi-threaded host applications, the hardware based copy operation can eliminate the overhead associated with the copy operation from the host CPU""s. Even without multi-threading, the hardware can handle reordering issues before the CPU reaches the point where the copy operation would be performed, thereby streamlining the copying process. Beyond that, programming API""s that support asynchronous communication can benefit from the copy off-load, even when there is no multi-threading.
For these reasons, and for backward compatibility, having a certain amount of buffering on the network adapter will be desirable for sometime to come. As more connection support DDP and RDMA, the amount of buffering required would presumably decrease. However, higher link bandwidths will offset this trend to some extent. Hence, the need for efficient buffering and reordering of inbound network traffic will remain relevant for a long time to come.
The present invention provides a mechanism for reordering data at a data destination in a higher performance network. The present invention provides dynamic, adaptive management of receive buffers in a host channel adapter while recovering on the fly the order of data sent over a medium that does not preserve order. The present invention operates for a number of important communication mediums, and without requiring searches of any type.
In an exemplary embodiment, the present invention includes a Reorder Memory Access Controller (RMAC) which is a unit that can be implemented for networks that do not support in-order delivery on a connection or virtual lane basis, or which have upper layer protocols that do not require pre-posted host receive buffers. The RMAC buffers incoming data and recovers the transmit order on the fly.
To buffer incoming data the RMAC supports a large reorder buffer, e.g., up to 4 Gigabytes (GB) of Double Data Rate (DDR) Synchronous Dynamic Random Access Memory (SDRAM). Although not strictly required, best performance is achieved if there is also a DDR/quad data rate (QDR) SRAM control store, where state information on buffers allocated to currently active connections is maintained.
RMAC interfaces include several request/response and data ports. A request/response port is provided for a Protocol Engine (PE), through which the PE accesses information required for protocol processing, and to issue RMAC state update commands. The actual transfer of protocol information occurs over another port to an Arrays Interface unit. A request/response port is provided for a Pre-Load Engine (PLE), which directs the RMAC where to place incoming data in the reorder buffer, and to issue RMAC state update commands. A port to the Control Memory Access Controller (CMAC) is provided for the RMAC to retrieve or update state information in control store needed to manage the reorder buffer. In addition, a port is provided for the Direct Memory Access Controller (DMAC) which is used by the DMAC to transfer data from the reorder buffer to host memory. Data coming from the network is staged on chip through a Data Transfer Buffer (DTB), and read from there by the RMAC on a dedicated port.
In an exemplary embodiment, the present invention provides a method and apparatus of reordering data of a data transmission received from a source device. The method and apparatus receives, in a data transfer buffer, a data packet transmitted over a connection associated with the source device and determines if the connection requires reordering of data packets. If the connection requires reordering of data packets, the data packet is transferred from the data transfer buffer to a reorder buffer and a reorder state cache is updated to reflect the transfer of the data packet to the reorder buffer. In response to receipt of a request to transfer data from the reorder buffer to the host memory, a next data packet sequence number is fetched from the reorder state cache and a position in the reorder buffer of the data associated with the next data packet sequence number entry is identified. The data is at this identified position is then transferred to the host memory.
These and other features of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the preferred embodiments.