Processing systems often utilize a direct memory access (DMA) process to allow input/output (I/O) devices to access system memory substantially independent of the processing cores of the system. In a conventional read operation from an I/O device using DMA, the I/O device is instructed, for example by a device driver, to perform a memory access operation to transfer a copy of the I/O data from the I/O device to system memory. A processing core then may subsequently utilize the I/O data by performing another memory access operation to access the data from the system memory and cache the I/O data at a cache hierarchy accessible by the processing unit. As such, each read operation from an I/O device involves at least two high-latency memory access operations before the subject data is available for use by a processing core. Moreover, some processing systems utilize dual data rate (DDR) dynamic random access memory (DRAM) or another memory architecture in which the memory bus is limited to either a read operation or a write operation at any given time (that is, cannot perform both a read operation and a write operation concurrently), and thus the two memory access operations used to make I/O data available to a processing unit in a conventional system impacts the memory subsystem's availability to handle memory access operations for other requesters.