The present disclosure relates to computer hardware, and more specifically, to overlapping the execution, and preparation for execution, of hardware accelerator tasks to improve computer system performance.
Computing systems may be configured to execute operations to manipulate large volumes of data according to defined algorithms. Execution of these operations include transferring data between memory and central processing units (CPUs) via I/O subsystems configured to, inter alia, provide I/O support CPUs and maintain data coherency between memory and various components of a computing system. The workload of a CPU may be affected by the volume of data being processed and the computational complexity of the algorithms for processing the data.
Some customer-specific and/or computation-heavy algorithms may be offloaded from a CPU to a hardware accelerator such as a Field Programmable Gate Array (FPGA), thereby reducing workload of the CPU. Communications and coherency between the hardware accelerator and the host machine may be controlled by a coherent accelerator processor interface to remove the overhead and complexity of the I/O (Input/Output) subsystem, for example. Since the accelerator device is a hardware engine, it may be difficult for it to service, or execute, certain I/O functions. Conventionally, hardware accelerators receive help from, or utilize the resources of, host processors and operating system (OS) to service some of these I/O functions. Relying on a host to service I/O operations may negatively impact performance of hardware accelerators.