Typical data storage devices designed to be compatible with non-volatile memory express (NVMe) (this standard is available at http://www.nvmexpress.org) and peripheral component interconnect express (PCIe) (this standard is available at http://pcisig.com) standards communicate with a host compute device through a programmable memory interface that includes submission queues and completion queues, defined by the NVMe standards body. A processor of the host compute device allocates and assigns message signaled interrupts (e.g., MSI, MSI-X, etc.) and the submission and completion queues in a volatile memory for the data storage devices. To pass requests to a data storage device, the processor writes the request to the submission queue of the data storage device according to a predefined format and notifies the data storage device of the new request by writing an update to the message signaled interrupt. Similarly, upon completion of the request, the data storage device writes a completion status to the completion queue and writes to a message signaled interrupt assigned to the processor to notify the processor of the completion of the request.
Typically, when a compute device has multiple data storage devices, such as solid state storage devices (SSDs), to move data between the devices, a host driver executed by the processor performs a read from the source data storage device, stores the read data in volatile memory (e.g., dynamic random access memory), issues a write command to the destination data storage device, waits for completion, and then deallocates and deletes the data from the source data storage device and the volatile memory. A power loss during any of the phases of the process may result in an incoherent move operation. Hence, the processor typically journals the state of the move operation and, if an unexpected power loss occurs, the processor finishes the operations based on the journaled state. Further, each update to or read from the message signaled interrupt is an uncached memory mapped input output (MMIO) operation consuming hundreds of processor cycles. As a result, the processor in typical compute devices incurs significant overhead to manage a move operation between two data storage devices.