High-performance computing (HPC) has seen a substantial increase in usage and interests in recent years. Historically, HPC was generally associated with so-called “Super computers.” Supercomputers were introduced in the 1960s, made initially and, for decades, primarily by Seymour Cray at Control Data Corporation (CDC), Cray Research and subsequent companies bearing Cray's name or monogram. While the supercomputers of the 1970s used only a few processors, in the 1990s machines with thousands of processors began to appear, and more recently massively parallel supercomputers with hundreds of thousands of” “off-the-shelf” processors have been implemented.
In an HPC environment, large numbers of computing systems (e.g., blade servers or server modules) are configured to work in parallel to solve complex tasks. Each server may include one or more processors with associated resources (e.g., local memory for each processor), wherein each processor is operated as a compute “node.” The servers typically operate within a collective group called a cluster to perform a collective operation. For more complex tasks, clusters of servers may be configured in an HPC cluster hierarchy or the like, with each cluster dedicated to performing a subtask of the overall complex task.
Various types of network topologies and protocols may be used to interconnect nodes in an HPC environment, with the most commonly used interconnects employing InfiniBand or Ethernet. In a typical HPC use of InfiniBand, the compute nodes run processes that use an Application Programming Interface (API) to exchange data and results with other processes running on other nodes. Examples of these APIs include Message Passing Interface (MPI), Symmetric Hierarchical Memory Access (SHMEM), and Unified Parallel C (UPC). In particular, these processes use a class of operations called “Collectives,” which are used to enable communication and synchronization between multiple processes on multiple nodes.
These Collective operations require communication between multiple computers in the HPC cluster. As the number of processes involved in the operations grows, the number of additional messages needed to handle possible errors and to synchronize the processes also grows. In addition, the Collective operations are unaware of the physical topology of the interconnect network. These two factors create inefficiencies that degrade the performance of the HPC cluster, causing computations to take longer to complete.