Direct memory access (DMA) is a feature that allows for hardware subsystems within a computer system to access system memory independently of a system processor. This allows the system processor to perform other tasks while a DMA engine completes a data transfer, making DMA engines especially useful in input/output (I/O) applications. Common hardware subsystems using DMA engines include graphics cards, sound cards, network cards, and disk drive controllers. DMA engines can also be used for “memory to memory” copying or moving data within memory. DMA can offload expensive memory operations, such as large scatter-gather operations, from a system processor to a dedicated DMA engine.
A DMA engine can generate addresses and initiate memory read or write cycles. Typically, a DMA engine contains several registers that can be written and read by a system processor, including, for example, a memory address register, a byte count register, and one or more control registers. The control registers may specify the I/O port to use, the direction of the transfer (reading from the I/O device or writing to the I/O device), the transfer unit (byte at a time or word at a time), and the number of bytes to transfer in one burst.
Sophisticated DMA engines often process data based on instructions in a list or work queue specific to the hardware subsystem that data is being received from or sent to. These instructions are referred to herein as “work queue elements,” “WQEs,” or “control instructions.” Each element in the work queue should provide at least a source location (e.g., in memory or a remote system) from which to fetch data, a target destination (e.g., in memory or a remote system) where the fetched data should be stored, and how much data to move from the source location to the target destination. In other embodiments, work queue elements may describe multiple addresses from which to pull and store data, and may translate scatter-gather lists to determine source and/or target locations.
In some instances, it can be desirable to have multiple work queue elements held in the DMA engine so that as soon as one completes, the next can start, thereby avoiding latency issues associated with fetching the next work queue element. A common way to do this is to have software “push” the work queue elements to the DMA engine and have the engine hold them internally (e.g., in an array). This approach requires communication from hardware to software on when the next work queue element should be pushed to hardware and may require substantial silicon area to store the work queue elements.
Alternatively, the DMA engine can prefetch the work queue elements. In this manner, the DMA engine can fetch the next work queue element while finishing up the previous work queue element, thereby avoiding the latency associated with fetching a work queue element after completing an operation, and negating the need for closely timed software-hardware interaction.
When the DMA engine is shared among many threads—unrelated streams of data to be processed, e.g., from different logical ports or partitions, processors, channels, queue pairs (in Infiniband/HEA terms), etc.—the DMA engine must balance the costs of switching between threads and maintaining a level of fairness in processing data from the threads. For example, there is often significant overhead associated with switching from one thread to another. A variety of context may be needed for each thread (e.g., head/tail pointers, translation entries, logical partition protection information, miscellaneous control information, etc.) and switching from one thread to another requires storing/updating the existing context and fetching new context for the new thread. As such, it is advantageous to remain on one thread for as long as possible (as long as there are work queue elements). However, a single thread cannot be processed to the exclusion of the other threads.
It is known to accomplish “fairness” between threads by switching from one thread to another once a certain number of bytes of data (a threshold value) has been moved for a current thread.