The invention generally relates to communications across a data network and in particular to a credit based flow control scheme for use over a Virtual Interface Architecture or the like.
Standard user-level networking architecture such as Virtual Interface (VI) Architecture enables distributed applications to perform low overhead communication over System Area Networks (SANs). The Virtual Interface (VI) Architecture is described in the Virtual Interface Architecture Specification, Version 1.0, Dec. 16, 1997. With the advent of System Area Networks (SANs), low latency and high bandwidth interconnects have become a reality. This has opened new horizons for cluster computing. The centralized in-kernel protocol processing in legacy transports (e.g., TCP/IP) prohibits applications from realizing the potential raw hardware performance offered by underlying high-speed networks. Virtual Interface (VI) Architecture standard has further made it possible to perform low overhead communication using off-the shelf SAN hardware. However, building high-level applications using primitives provided by VI Architecture is complex and requires substantial development efforts because the VI Architecture does not provide transport level functionality such as flow control, buffer management, fragmentation and reassembly. Moreover, it is impractical to implement existing network protocols such as the Transmission Control Protocol (TCP) over VI Architecture because this would result in unnecessary additional overhead. TCP uses a sliding window flow control protocol that uses sequence numbers, acknowledgments, error detection, retransmission of lost packets, etc., because the underlying network is presumed to be inherently unreliable. SANs have very low error rates and high reliability levels offered by VI Architecture (reliable delivery and reliable reception) and consider transport errors catastrophic. Thus, due to the reliable delivery and reliable reception of VIs, which break connection on extremely rare transport errors and guarantee exactly once, intact, in order data delivery, many of the functions performed by TCP to ensure reliability are redundant and would add unnecessary overhead.
Therefore, a need exists for a communication service that provides some transport level services over the VI Architecture, such as flow control, buffer management, fragmentation and reassembly, without adding unnecessary overhead.
According to an embodiment of the invention, a method of sending data from a local endpoint system to a remote endpoint system across a network is provided. The local endpoint system includes a plurality of work queues for posting data transfer requests. It is determined if a sufficient number of send credits is available at the local endpoint system. A data packet is sent from the local endpoint system over the network if a sufficient number of send credits are available. Otherwise, if a sufficient number of send credits is not available at the local endpoint system, a credit request packet is sent from the local endpoint system to the remote endpoint system, and the local endpoint system waits for a credit response packet from the remote endpoint system before sending a data packet.
According to an embodiment of the invention, a method of receiving data at a local endpoint system across a network. The local endpoint system includes a plurality of work queues for posting data transfer requests, one or more registered send and receive buffers, and one or more application receive buffers. A packet is received and it is determined whether the packet is a data packet. Several steps are performed if it is a data packet. The system is polled for any additional packets that have been received by the local endpoint system. The data for all the received packets is copied from the registered receive buffers to one or more application buffers. These registered buffers which have been copied are then made available. The number of receive credits is updated based on the additional available receive buffers.