Hardware accelerators can be used to help a central processing unit (CPU) process workloads. The workloads often require using data from a CPU cache. To provide the accelerators with the data from the CPU cache, cache maintenance operations need to be implemented. However, cache maintenance operations can have negative impacts on the performance of processing a workload offloaded to an accelerator. Manual cache maintenance operation execution generally takes too long for offloading work to accelerators. A cache flush maintenance operation must be implemented before an accelerator run and cache invalidate maintenance operation must be implemented after the accelerator run. Small workload offloads to accelerators is increasingly important for improving processing performance. These small workload offloads can suffer the most performance degradation from cache maintenance penalties.
Input/output (I/O) coherency can be used to offload work to accelerators without having to implement costly cache maintenance procedures. However, an I/O coherent path for offloading workloads introduces overhead because of lower performance signal transmission (e.g., lower bandwidth). The I/O coherency penalty can negatively affect offloaded workload processing due to various factors of the I/O coherent path.