This invention relates to resource management of heterogeneous computing clusters.
Coprocessor-based clusters are those whose nodes have many-core-based coprocessors such as the NVIDIA Graphical Processing Unit (GPU) or the Intel Many Integrated Core (MIC). The coprocessor itself can be a generic concept, not necessarily a “multicore”/“manycore” processor but any processing element that can execute portions of the computation. Such a “coprocessor” can be an FPGA (specialized/customizable computation unit), a standalone processor like IBM Cell, a GPU, a Intel MIC, or any other many core processors. The coprocessor may or may not be connected by a PCI bus; instead, it can be connected by many different types of interconnect. For example, the coprocessor can be on the same chip as the main CPU (such as the AMD Fusion or IBM Cell), or connected by a bus (PCI/PCIe bus).
GPU-based clusters are increasingly being deployed in HPC environments to accelerate a variety of scientific applications. Despite their growing popularity, the GPU devices themselves are under-utilized even for many computationally-intensive jobs. This stems from the fact that the typical GPU usage model is one in which a host processor periodically offloads computationally intensive portions of an application to the coprocessor. Since certain portions of code cannot be offloaded to the GPU (for example, code performing network communication in MPI applications), this usage model results in periods of time when the GPU is idle.
GPUs could be time-shared across jobs to “fill” these idle periods, but unlike CPU resources such as the cache, the effects of sharing the GPU are not well understood. Specifically, two jobs that time-share a single GPU will experience resource contention and interfere with each other. The resulting slowdown could lead to missed job deadlines. Current cluster managers do not support GPU-sharing, but instead dedicate GPUs to a job for the job's lifetime.
The typical coprocessor usage model is one in which the host processor in each cluster node intermittently offloads intensive computations to the coprocessor. This usage model creates gaps in the coprocessor usage, i.e., periods when the coprocessors are idle. Coprocessor idle periods occur when a code block runs on the host and not the coprocessor because (i) it is not sufficiently parallelizable to benefit from the coprocessor, (ii) any performance gains are over-shadowed by overheads such as PCI data movement or (iii) it performs system operations such as network or disk I/O that current many-core coprocessors are incapable of (at least in the offload mode).
Idle periods in coprocessor usage can be reduced or eliminated by “time-sharing” coprocessors across HPC jobs. However, coprocessor time-sharing causes jobs to interfere with each other since it creates resource contention. This inter-job interference slows down jobs, but the precise effect is hard to predict. Most current cluster managers such as PBS Torque and Condor do not generally time-share coprocessors across jobs; rather they dedicate coprocessors to specific jobs until they complete. In some cases, the cluster managers allow users to specify if their jobs can share coprocessor resources. Such jobs are allowed to share coprocessors but the responsibility for any interference-related slowdown rests with the user, since the job was specified to be sharable to start with.