Modern computing systems include direct memory access (DMA) capability which allows certain hardware subsystems to move data independently of the central processing unit (CPU). In systems without DMA, the CPU may have to transfer each individual piece of data from a source to a destination. Without DMA, the CPU is typically occupied for the duration of a memory operation (i.e., read, write, copy, etc.) and therefore unavailable for other tasks involving CPU bus access.
Using DMA, a computing system can transfer data with much less CPU overhead. A DMA engine may be used to move data between an I/O device and main memory, in either direction, or between two memory regions. DMA allows a CPU to initiate a data transfer, and proceed to perform other operations while the data transfer is managed by the DMA engine. A DMA transfer essentially copies a block of memory from one device to another, such as a block of memory from system RAM to a buffer on the device.
A DMA operation removes certain processing overhead for data transfer from the CPU. Additionally, a DMA operation may be performed asynchronously with the CPU, which allows for effective overlap with CPU operations. In other words, the CPU may perform other operations concurrently during a DMA operation.
Current systems for DMA are popular and efficient; however, they suffer from two main drawbacks. Most systems are based on a hardware design, which is inherently inflexible. A hardware based DMA requires the utilization of physical addresses for the source and destination addresses. Therefore, if virtual addresses are used, the DMA engine must be configured to pin down the data (i.e., prevent swapping out memory regions to secondary storage) prior to translation from virtual to physical addresses, and keep the memory pinned down until the data transfer is complete, since a data transfer cannot be stopped and restarted if a page fault occurs.
Moreover, current designs cannot keep up with the complexity of modern bus designs (i.e., front side busses (FSBs)) and cache coherency protocols. Modern FSBs are complex, highly pipelined busses, designed for maximizing the performance of CPU to system memory access. At the same time, modern FSB designs strive to provide cache coherency between system memory and caches, and between processors in a multiprocessor system. If the DMA engine fails to support all the features of modern FSBs, the system performance suffers, and the DMA engine may even force the FSB to stall or slow down while a DMA transaction is in progress. Further, current hardware based DMA systems fail to take advantage of the highly threaded nature of the new class of processors which provide fine-grained chip multithreading.
Methods and systems are needed that can overcome the aforementioned shortcomings by providing efficient and flexible data movement, while at the same time taking advantage of the efficient way in which the CPU accesses main memory.