Ever since the introduction of the microprocessor, computer systems have been getting faster and faster. In approximate accordance with Moore's law (based on Intel® Corporation co-founder Gordon Moore's 1965 publication predicting the number of transistors on integrated circuits to double every two years), the speed increase has shot upward at a fairly even rate for nearly three decades. At the same time, the size of both memory and non-volatile storage has also steadily increased, such that many of today's personal computers are more powerful than supercomputers from just 10-15 years ago. In addition, the speed of network communications has likewise seen astronomical increases.
Increases in processor speeds, memory, storage, and network bandwidth technologies have resulted in the build-out and deployment of networks with ever substantial capacities. More recently, the introduction of cloud-based services, such as those provided by Amazon (e.g., Amazon Elastic Compute Cloud (EC2) and Simple Storage Service (S3)) and Microsoft (e.g., Azure and Office 365) has resulted in additional network build-out for public network infrastructure, in addition to the deployment of massive data centers to support these services which employ private network infrastructure.
A typical data center deployment includes a large number of server racks, each housing multiple rack-mounted servers or blade servers. Communications between the rack-mounted servers is typically facilitated using the Ethernet (IEEE 802.3) protocol over copper wire cables. In addition to the option of using wire cables, blade servers and network switches and routers may be configured to support communication between blades or cards in a rack over an electrical backplane or mid-plane interconnect.
In addition to high-speed interconnects associated with Ethernet connections, high-speed interconnect may exist in other forms. For example, one form of high-speed interconnect InfiniBand, whose architecture and protocol is specified via various standards developed by the InfiniBand Trade Association. Another example of a high-speed interconnect is Peripheral Component Interconnect Express (PCI Express or PCIe). The current standardized specification for PCIe Express is PCI Express 3.0, which is alternatively referred to as PCIe Gen 3. In addition, both PCI Express 3.1 and PCI Express 4.0 specification are being defined, but have yet to be approved by the PCI-SIG (Special Interest Group). Moreover, other non-standardized interconnect technologies have recently been implemented.
An important aspect of data center communication is reliable or confirmed data delivery. Typically, a reliable data transport mechanism is employed to ensure data sent from a source has been successfully received at its intended destination. Current link-layer protocols, such as Ethernet, do not have any inherent facilities to support reliable transmission of data over an Ethernet link. This is similar for the link-layer implementation of InfiniBand. Each address reliable transmission at a higher layer, such as TCP/IP. Under TCP, reliable delivery of data is implemented via explicit ACKnowledgements (ACKs) that are returned from a receiver (at an IP destination address) to a sender (at an IP source address) in response to receiving IP packets from the sender. Since packets may be dropped at one of the nodes along a route between a sender and receiver (or even at a receiver if the receiver has inadequate buffer space), the explicit ACKs are used to confirm successful delivery for each packet (noting that a single ACK response may confirm delivery of multiple IP packets). The transmit-ACK scheme requires significant buffer space to be maintained at each of the source and destination devices (in case a dropped packet or packets needs to be retransmitted), and also adds additional processing and complexity to the network stack, which is typically implemented in software. For example, as it is possible for an ACK to be dropped, the sender also employs a timer that is used to trigger a retransmission of a packet for which an ACK has not been received within the timer's timeout period. Each ACK consumes precious link bandwidth and creates additional processing overhead.