Clusters of computing devices including interconnected computing devices are sometimes employed to process high-volumes of data. A computing cluster is a set of computing devices (also, “nodes”), e.g., configured as server racks comprising computing device “sleds” occupying one or more enclosures, or a computing network comprising multiple computing devices. Various data communications technologies have been deployed to enable the sleds to exchange data, e.g., Ethernet, Fiberchannel, etc. However, these technologies generally exchange data more slowly than processors are able to process data. Increasing parallel processing and networking throughput can be useful to improve cluster performance. Different techniques to reduce interconnection overhead and latency have been tried using software and hardware, but such techniques are limited by conventional data pathway architectures.
One method of transporting data involves the use of a Peripheral Component Interconnect Express (PCIe) bus. The PCIe bus enables an external signal to interrupt the central processing unit (CPU) when data transport processes are ready to be performed with the source of the external signal. However, maintaining the PCIe bus requires additional hardware components (e.g., bus line drivers). This increases the overall cost of maintaining a computing cluster. Further, the PCIe bus also requires the CPU to quickly offload data from a PCIe buffer to a memory module or vice versa for most data transport operations.
Another technique for transferring data is via a network interface controller (NIC). Under a NIC data transport architecture, a computing device can prepare data for transmission, sharing, or copying to an external device (e.g., a different computing device) under command of an application or operating system, and then transfer the data via a network driver. For example, the application can prepare an outgoing data set in a first memory space (e.g., associated with a computing device on which the application executes) and then transfer the outgoing data set into a second memory space (e.g., of associated with a network driver or NIC). In response, the network driver may place the outgoing data set into a network input/output (TO) buffer. The network IO buffer may reside in a dedicated memory space of a network card. Once the outgoing data is in the network IO buffer, the network card can transmit the outgoing data through a network connection configured for use with the NIC, e.g., an optical fiber or an Ethernet cable. Under this architecture, the same outgoing data set is copied multiple times (e.g., at various buffers and memory spaces). This repeated copying can bottleneck the entire data transfer process and cause various inefficiences. Accordingly, these inefficiences can slow down the data transfer speeds of most cluster computing applications.
The figures depict various embodiments for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments may be employed.