DMA engines move a block of data from source to destination address autonomously from CPU control. The individual DMA transfer is configured using a descriptor normally containing the source address, destination address, a number of control parameters and a link to the next transfer for the DMA Engine to process once complete. Usually a DMA Engine will be constructed of a number of independent contexts processing transfers in parallel, each context having dedicated hardware for first fetching a descriptor and then transferring data from a source to a destination based on the fetched descriptor. The descriptor can be fetched from any memory the DMA engine has access to. This can be local dedicated memory or external off-chip memory.
Some transfers may be discontinuous. A discontinuous transfer is a transfer with non-sequential source and/or destination addresses. For example, a transfer N of bytes from A to B can be constructed from a single descriptor, but two descriptors are needed to transfer N bytes from A to B and M bytes from C to D. Some transfers may be circular. A circular transfer is a transfer whereby bytes are repeatedly transferred from source to destination, e.g. N bytes repeatedly transferred from A to B. After the first N bytes are transferred the DMA engine must fetch the same descriptor again to transfer the second N bytes (a DMA engine could be designed to recognize this kind of transfer, but that would require additional hardware).
As a DMA transfer can only manipulate the source and/or destination address for a transfer linearly, then discontinuous or circular data transfers must be constructed using a series of linked descriptors to describe the transfer fully.
Equally, as the control parameters must be constructed from a finite number of bits to describe the transfer size, long transfers may also require a series of linked descriptors.
This can create an issue when the bandwidth required for a transfer made up of a series of descriptors is high. The DMA engine must effectively pause the data transfer to fetch and decode the next descriptor. Naturally, whilst doing this the data throughput will temporarily drop.
In real-time critical applications the drop in throughput may not be acceptable. This can be solved by “brute force” by applying more and more hardware to the problem. That way, either the design must contain a mixture of distinct high and low bandwidth DMA engines or waste the additional resource required for high performance transfers when a lower bandwidth would suffice. Thus simply adding more hardware is wasteful in that it increases power and die size.