The International Standards Organization (ISO) has established the Open Systems Interconnection (OSI) reference model. The OSI reference model provides a network design framework allowing equipment from different vendors to be able to communicate. More specifically, the OSI reference model organizes the communication process into seven separate and distinct, interrelated categories in a layered sequence. Layer 1 (L1) is the Physical Layer, which handles the physical means of sending data. Layer 2 (L2) is the Data Link Layer, which is associated with procedures and protocols for operating the communications lines, including the detection and correction of message errors. Layer 3 (L3) is the Network Layer, which determines how data is transferred between computers. Layer 4 (L4) is the Transport Layer, which defines the rules for information exchange and manages end-to-end delivery of information within and between networks, including error recovery and flow control. Layer 5 (L5) is the Session Layer, which deals with dialog management and controlling the use of the basic communications facility provided by Layer 4. Layer 6 (L6) is the Presentation Layer, and is associated with data formatting, code conversion and compression and decompression. Layer 7 (L7) is the Applications Layer, and addresses functions associated with particular applications services, such as file transfer, remote file access and virtual terminals.
In some communication systems, network interface controllers (NICs) may be required to support multiple interfaces to a host system that may be running with a plurality of different levels of offload. The host system interfaces may include legacy L2 services, transport level L4 services, or session level L5 services. For the legacy L2 services, the NIC provides a lower amount of offload where pre-formatted packets from the system are simply read from the system and transmitted. For the transport level L4 services, the NIC provides reliable data transport service on a connection by connection basis. A normal implementation of this type of offload includes TCP/IP offload. For session level L5 services, the NIC provides upper level protocol L5 services in which the NIC provides protocol specific services, such as digest or header composition/decomposition, as well as protocol specific or generic buffer-to-buffer copy services across the network with reliable data transport. A typical implementation of this type of offload is RDMAC protocol or iSCSI protocol.
FIG. 1A is a block diagram of a conventional system 100 that may be required to support multiple interfaces to a host system that may be running with a plurality of different levels of offload. Referring to FIG. 1A, the system 100 may comprise operating system 101, L2 driver 103, L4 driver 105, L5 driver 107, L2 only NIC 109, L4 only NIC 111, L5 only NIC 113, and external switch 115. The system 100 may utilize operating system 101 to support L2, L4 and L5 types of offload. The operating system 101 may utilize separate drivers and separate NICs for each type of offload. For example, L2 type of offload may be managed by a L2 NIC 109 utilizing a L2 driver 103, L4 type of offload may be managed by a L4 NIC 111 utilizing a L4 driver 105, and L5 type of offload may be managed by a L5 NIC 113 utilizing a L5 driver 107.
The conventional system 100 is a multiple support environment in which each of the offload layer protocols is implemented on a separate NIC or Host Bus Adapter (HBA). Since each of the offload layer protocols is implemented on a separate NIC, there is little need to manage the transmit bandwidth between the different levels of services. Each NIC has its own independent connection to the network. For example, it may be a common occurrence for some or all of the protocols for the L2 NIC 109, the L4 NIC 111 and the L5 NIC 113 to be simultaneously transmitting. In this regard and when all the NICs transmit on the same network (e.g. Ethernet) and that network transfers all of the above data types form the multiple NICs to the destination/s, such solution may utilize a single external switch 115 to combine traffic on a per-packet basis (directly attached or somewhere between source and destination). Each NIC in this solution may not take the traffic needs of adjacent NIC devices into account. Consequently, if all three NIC devices transmit at a combined rate greater than a designated egress port, leading towards the destination on the switch 115, data may accumulate inside the switch and packets from two or more of the NIC devices may be delayed or even dropped regardless of protocol type, connection priority characteristic, or protocol element type. In addition, costs for power, cooling, and/or component costs, both within and outside the system, may be considerable.
A second problem may arise since all offloads of L4 and above must meter out the transmit bandwidth between many different connections. A latency sensitive connection having a small amount of data to transmit, may have to wait until busier connections are idle before the connection with the small amount of data is allowed to transmit. Furthermore, busy connection(s) may operate for long periods and the small connections with sparse amounts of traffic may not be permitted to transmit until the busy connections have become idle. This may occur because system performance is normally sensitive to throughput for high bandwidth connections while latency is important for connections with a sparse amount of traffic, or because the system may have a policy of sending all the data available for a connection or very large blocks of data per connection.
In addition, offload NICs may nowadays transmit data faster than it was previously possible. One problem with this acceleration is that it is more important to keep the accelerated NIC transmitter updated as to the buffer status on the receiving NIC. For the TCP protocol, for example, the receiver window size may normally be enlarged, by configuration or other, when the network speed is increased. The receiver, therefore, must provide buffering for receive data up to the size of the TCP send window. This may require more costly memory either in the offload NIC or in the host of the receiver, to achieve the higher throughput.
FIG. 1B is a diagram illustrating transmit behavior characteristic of the system of FIG. 1A. Referring to FIG. 1B, there is shown a connection A 121, a connection B 123 and a transmit behavior 125 for connections 121 and 123 when an external switch is utilized. Connection A, 121, may have a small amount of data to occasionally transmit and connection B, 123, may have a large amount of data to transmit. In the conventional system 100 of FIG. 1A, the transmit behavior 125 may correspond to a transmit pattern when an external switch is utilized to switch between connections A 121 and B 123. Since the large data transmit of connection B 123 may be broken up into individual packets, a switch, such as the switch in the conventional system 100 of FIG. 1A, may be utilized for limiting delay incurred by connection A 121. In this regard, packetized data streams may be merged after they have been packetized by the independent NICs.
FIG. 1C is a diagram illustrating data transmit and acknowledgement receive behavior characteristics of the system of FIG. 1A. Referring to FIG. 1C, transmit data traffic may be represented by transmit connections 131, 137, and 143. Corresponding receive acknowledgements may be represented by receive connections 133, 139, and 145. Bandwidth window size for each of the transmit data connections 131, 137, and 143 may be represented by bandwidth window sizes 135, 141, and 147, respectively. With regard to each of the three connections 131, 137, and 143, the transmitter must have “credit” to transmit packets. This credit may be equal to the amount of memory that the receiver has dedicated for this connection to receive data into. The amount of credit available is referred to as “window size” in TCP. The transmitter may transmit up to the available credit, but then must wait for acknowledges from the receiver which may restore some credit level before continuing. The receiver may be adapted to restore credit for data that has been properly received and/or processed by transmitting acknowledges 133, 139, and/or 145.
The first set of transmit data connection 131 and receive acknowledges 133 illustrates behavior at traditional speeds. A receiver may promptly generate acknowledges 133 about every two packets and may communicate the acknowledges 133 within one large packet time. As a result, a minimum possible bandwidth window size 135, that may be utilized to achieve full bandwidth, may be well controlled. The second set of transmit data connection 137 and receive acknowledges 139 illustrates packet processing behavior as the network communicates faster and not with the same scale as the first transmit-receive set. The receiver in this case is promptly generating acknowledges 139, however, the generated acknowledges may take longer time to transition the network as before. In the same amount of time, much more data may be transmitted, so the receiver may need to be configured with a larger window, which is bigger than the minimum possible bandwidth window 141 to achieve full bandwidth. To achieve full bandwidth, the window size may be configured at an increased size, which may be equal to, or larger than, the minimum possible window size 141. The increased window size may consume more receiver memory. However, the increased window size may be utilized to compensate for network latency and to achieve full bandwidth.
The third set of transmit data connection 143 and receive acknowledges 145 illustrates packet processing behavior when the receiver's transmitter is characterized with poor TX scheduling behavior. In this case, the acknowledges 145 may be delayed due to waiting for transmission of some other connection, for example, and may emerge as a group later in time. Accordingly, the minimum bandwidth window size 147 that may be required to achieve full bandwidth, may be further impacted and significantly increased. If the window size is not adjusted to match or exceed the new minimum bandwidth window size 147, the transmitter may “stutter,” or come to a full stop, waiting for more ACK from the receiver. Since the window size for any one connection is normally fixed in size, it is important that ACK transmit behavior be predictable to keep window size requirements to a minimum and to maintain full possible bandwidth. Further, if the window size is limited below the minimum required size for full bandwidth, the possible bandwidth of the connection may be reduced. Any additional delay in generation of ACK packets, therefore, may further reduce the connection bandwidth.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.