Machine learning/deep learning workloads utilize GPUs to offload content and perform operations involving extremely-large amounts of data. The throughput of the interface between CPU and GPU, as well as GPU to GPU, is extremely significant and hence the latency is extremely important. Some current application workloads demand GPU to GPU traffic, which is enabled by either a PCI-e switch (in cases where the GPUs are endpoints) allowing for peer-to-peer (P2P) traffic without the involvement of the CPU, or a separate high speed link between the GPU and the CPU.
Moreover, for machine learning/deep learning workloads, rack-mount systems are increasingly being used in datacenters that include a hardware framework including slots or bays for mounting multiple computing machines (nodes) in a rack, such as network servers, routers switches or other network devices. While a rack-mount system typically includes a controller and cooling fans for implementing thermal control of the nodes, such as by removing heat, it is typical that, in the course of running programmed workloads, each node may not be heated the same way in the rack. Excessive power conduction and thermal hot spots may develop at one or more nodes, or parts of the node circuit motherboards, and may impact performance and/or decrease reliability of the computing rack network infrastructure.