1. Field of the Invention
The present invention relates to the field of computer networking and data communication.
2. Related Art
Problem
Dialogs (also called virtual circuits) carry data between different application processes. Dialogs can be logically set to carry data over a computer network such as a mesh. In a computer network, dialogs provide data communication between application processes running on different end systems or hosts. Dialogs can also carry data between application processes running on the same host.
Multiple functional layers (e.g., Application, Presentation, Session, Transport, Network, Link, and Physical) are used in a data communication network to provide different services and reliability in order to implement virtual circuits (i.e., dialogs). Each layer has an associated protocol and range of primitives to provide services. Each layer forms a corresponding protocol data unit that includes the data and corresponding layer protocol control information. Peer protocol entities at the same layer in different end systems provide services at that layer by managing corresponding layer protocol data units and protocol control information. This operation of multiple functional layers (e.g., Application, Presentation, Session, Transport, Network, Link, and Physical as used in an OSI or Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite) in a data communication network is well-known and need not be described in futher detail. See, e.g., Martin, J., TCP/IP Networking: Architecture, Administration, and Programming, (PTR Prentice-Hall: Englewood Cliffs, N.J. 1994), pp. 29-30 (incorporated herein by reference) and F. Halsall, Data Communications, Computer Networks, and Open Systems, 4 Ed., (Addison-Wessley: U.S.A. 1996), p. 663 (incorporated herein by reference). Layers are implemented as software, firmware, and/or hardware.
Conventional communication systems now have high bandpass capability. Data throughput for high-speed networking technologies occurs at rates on the order of 100 Megabits/sec to 1 Gigabits/sec. Latency, however, is high. Latency is the time interval between the time a transaction issues and the time the transaction is reported as being completed. In systems with a high latency, the round-trip time for two communicating clients to complete a data request can be on the order of milliseconds.
Latency occurs in conventional communication systems due in part to the overhead involved in the communication layers, including but not limited to, the Transport layer and the layers logically below the Transport layer (e.g., the Network, Link, and Physical layers). However, advancements have been made in lower layer network facilities. The transmission and delivery of messages over some networks is now much more efficient and reliable, especially in closely-coupled, clustered systems.
Transport layer facilities continue to impart substantial latency. Popular transport layer protocols, such as TCP, were developed to support local area and wide-area network (LAN/WAN) environments where the underlying bit rate was moderately high, but reliability was poor, and latency induced by the lower networking layers was high. Transport facilities are included in conventional transport protocols to guarantee reliable transmission and delivery. With the advent of very high-speed, low-latency communication networks like ATM, Fibre Channel, and ServerNet(trademark), facilities that were previously incorporated in a Transport Layer to achieve reliable communication, are now being provided by the underlying communication networks themselves. For example, ATM, Fibre Channel, and ServerNet(trademark) include specific lower layer facilities for ensuring reliable transmission and delivery, such as, in-order-delivery, check summing, and segmentation and reassembly (SAR).
Conventional high-latency Transport layer protocols and architectures, however, assume lower networking layers (e.g., Network, Link, and Physical layers) are unreliable. Therefore, high-latency transports, such as, the TCP/IP protocol suite, are not positioned to leverage advances in lower-layer data transmission reliability. Conventional transport layer protocols are further limited to a push data model of communication where data is sent regardless of whether a receiver can accommodate the data. Such push model data communication causes flow control problems and excessive data copying.
What is needed is a high-speed, low-latency intraconnect architecture having efficient transport layer processing. A standard transport layer protocol and architecture is needed that can leverage improvements in the reliability of data transmission and delivery, especially for closely-coupled, clustered systems. What is needed is a high-speed, low-latency transport intraconnect architecture that eliminates data copies and provides effective flow control.
According to the present invention, a communication intraconnect architecture (CIA) is specified which provides a reliable and efficient transport service between communicating clients using a pull data model. The pull data model is a communication model where a send client of a dialog waits for permission to send data to a receiving client. The receive client xe2x80x9cpullsxe2x80x9d data from the send client. Flow control is handled by the pull data model since the receive client requests data when the receive client is ready. Moreover, the communication intraconnect architecture, according to the present invention, implements a pull data model which transfers data as efficiently and reliably as a push data model.
The CIA pull data model of the present invention supports receive operations by requiring the sender to bind data bytes to receiver memory addresses. Data transfer between communicating send and receive clients can be conducted entirely by performing write-only operations that write data to memory. Read operations having a high latency can be avoided entirely.
According to one embodiment of the present invention, a method, system, and computer program product provide transport layer data communication based on a pull data model between communicating clients. To receive data, a receive client builds a CIA control block (CCB) that includes parameters for a dialog receive (d_rcv) primitive. The receive client passes the CCB to a receive-side CIA transport-layer facility. These d_rcv parameters identify a scatter list that defines the destination data areas (i.e., data destination addresses) and how much data space is available at each data destination address. For example, in Receive with Buffer operations, the d_rcv parameters identify, among other things, a receive-side buffer and a desired transfer length for that buffer. Additional d_rcv parameters are used to select available receive services (e.g., an auto receive service or partial receive service).
The receive-side CIA transport facility is also called a receive intraconnect front end (IFE). The receive IFE constructs a receive control block (RCB) based on the parameters passed by the receive client in a d_rcv primitive. The receive IFE sends the RCB in a network packet over a mesh to a send side CIA transport facility, that is, to a send IFE associated with the logical dialog.
At the send side, the send IFE stores the receive control block (RCB). The RCB arrival triggers match processing at the send side of the interface. The RCB includes fields that identify the scatter list (e.g., receive data destination addresses and maximum data transfer lengths and buffer lengths). The RCB includes other fields pertinent to support d_rcv semantics (e.g., auto-receive, buffer pool references). Multiple RCBs can be queued at the send-side to reduce latency and to accommodate multiple requests for data.
To send data, a send client passes d_send parameters for a dialog send (d_send) primitive in a control block (CCB) to the send IFE. The d_send parameters identify a logical dialog and a gather list. Additional fields are used to support other d_send semantics (e.g., partial transfer versus end-of message indication).
The transport layer at the send-side of a CIA interface (in other words, the send IFE) determines when a match occurs between a RCB and an outstanding d_send CCB. To transfer data, the send IFE binds data specified by the d_send CCB to destination addresses specified by the matching RCB.
In a networking environment where data is to be transferred over a mesh, the send IFE constructs a network packet. The network packet includes a copy of the sender""s data and receiver memory destination addresses. When receive and send clients share the same memory space, the send IFE performs the data transfers using memory copies.
At the receive side, the receive IFE deconstructs the received network packet. The receive IFE then stores the send side data from the network packet into the receive buffers specified by the memory destination addresses in the received network packet.
According to another feature of the present invention, two send classes of service are provided, Early-Far-End and Far-End. When a user requests Early-Far-End service, the send IFE sends a Send Complete indication to the send client as soon as the send IFE no longer requires the send client""s resources.
When a user requests Far-End service, the send IFE waits for an Acknowledgment packet sent by the receive IFE. Reception of an Acknowledgment packet by the send IFE triggers a Send Complete indication to the send client.
The present invention has fiber features and advantages. Multiple, fill duplex, reliable virtual circuit connections (i.e., multiple logical dialogs) can be provided for each node. Multiple dialog objects can reference a single node. Multiple outstanding CIA primitive operations are allowed on each dialog. Dialogs can handle byte steam or message oriented data.
Dialog operations and features include, but are not limited to, scatter and gather support, Early-Far-End and Far-End send classes of service, automatic recurring receive option, partial sends and receives messages with no substantial restrictions on send/rcv lengths, and multiple dialog priorities.
According to the present invention, a user level management dialog can be used to establish other logical dialogs. Type 1 and Type 2 dialog establishment services are provided.
According to a further feature of the present invention, a communication architecture is provided that utilizes buffer pools and pool managers in a pull data model to provide an efficient and reliable transport service. Buffer pool and pool manager operations provide address bound checking, buffer pool credits, low water mark notification, and data binding to further optimize data transfer performance.
Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.