Open Computing Language (OpenCL) is a parallel computing platform that may be used to write codes that are executed across heterogeneous platforms to deploy offload kernels. In OpenCL, parallel compute kernels may be offloaded from a host compute device to a heterogeneous device in the same system, such as a central processing unit (CPU), a graphics processing unit (GPU), Field-Programmable Gate Array (FPGA), or other processor or accelerator of the host compute device that is OpenCL-capable or compatible. Typically, OpenCL requires a device that is going to process OpenCL offload kernels to receive input data from a host memory. In other words, if the OpenCL kernel were to run inside the data storage device on data stored in a data storage device, it requires the data storage device to transmit the data from the storage device to a memory of the host compute device and receive the data back from the memory of the host compute device to the storage device for execution, which results in a wasteful data flow.