A compute accelerator (CA) is a specialized type of processor that performs certain mathematical calculations much faster than a conventional central processing unit (CPU). For example, a graphics processing unit (GPU) is a CA specially designed to rapidly manipulate and alter memory for the creation of images intended for output to a display device. Today, GPUs have been adopted as CAs for many fields of high performance computing outside of graphics processing, such as big data, artificial intelligence, neural networks, and cryptography. Other examples of CAs include specialized silicon, digital signal processors (DSPs), and field-programmable gate array (FPGAs).
CAs typically function in groups or farms in which many CAs work together to execute a kernel so as to perform a CA workload for that kernel. As used herein, a “kernel” is unrelated to the kernel of an operating system. In the context of CAs, a “kernel” or “compute kernel” is a small piece of code with one or more loops, and the loop(s) is executed many times by a CA or group of CAs to perform a CA workload. For example, to perform the CA workload of a transpose operation on a matrix, each column in the original matrix is turned into a row in the solution matrix. Turning each column of a matrix into a row is a simple but repetitive task. A very large matrix may be divided among several CAs, with each CA transposing a portion of the matrix.
As used herein, a “compute accelerator workload” is the set of operations that needs to be performed by one or more CAs in order to finish a distinct job on a working set. For example, to perform the CA workload of a “matrix transpose,” the CA(s) needs to turn all columns of a matrix into rows. As used herein, a “working set” of a CA workload is the data on which the kernel works while performing the operations of a CA workload. For example, the original matrix is the working set for a “matrix transpose” CA workload.
Depending on the size of a working set or on the workload to be performed on that working set, the CA workload may take a significant amount of time. Some CA workloads may take hours, days, or weeks to finish. Due to the nature of how CAs operate, it is typically impossible to pause a CA workload and resume it again later from the same point of execution. If a compute CA is interrupted, it must be started again from the beginning.
This is disadvantageous for several reasons. CAs may be shared between applications or tenants. If one application uses the CAs for a prolonged period of time, other tenants or applications may not be able to perform any CA workloads during that time. Another reason is that during execution of a CA workload, it may be desirable to migrate the workload from one host computer to another host computer. For example, CAs may reside on different sets of hardware (e.g., different host computers) and the CAs of one or more host computers may be used to execute a CA workload. The migration might be desirable for load balancing reasons, such as to evenly utilize hosts available in a cluster of hosts. The migration might also be desirable for fault tolerance. For example, if certain hardware malfunctions during execution of a CA workload, it might be desirable to pause the workload, move it to another set of hardware (another host computer), and resume the workload where it left off.