Clusters of computing devices including interconnected computer nodes are sometimes employed to process high-volume data or computation tasks. A computing cluster is a set of computing devices, e.g., configured as a computing network comprising multiple computing devices. Various data communications technologies have been deployed to enable the computing devices to exchange data, e.g., Ethernet, Fiberchannel, etc. However, these technologies generally exchange data more slowly than processors are able to process data. Different techniques to reduce interconnection overhead and latency have been tried on both software and hardware levels, but such techniques are limited by conventional system architectures of data pathways.
To improve performance, some computing devices have been designed to accommodate multiple processors. More recently, specialized processors (e.g., math processors, graphic processing units (GPUs), field programmable gate arrays, etc.) have been adapted for use with various computational processes. These specialized processors are referenced herein as “accelerators,” but various terms are commonly used to refer to these types of processors. Typically, accelerators are used when intensive computation, typically parallel mathematical computation, is involved. However, current computational needs have outpaced even the capabilities of accelerators. Some computing devices can operate with multiple accelerators. However, accelerators can consume and generate data much more quickly than standard computing buses (e.g., Peripheral Component Interconnect Express, or “PCIe”) and so standard interconnections between accelerators become bottlenecks. Moreover, interconnection topologies are fixed and cannot easily be changed to satisfy application requirements.
The figures depict various embodiments of the disclosed technology for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments may be employed.