1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a direct memory access controller with support for high latency devices.
2. Description of Related Art
Many system-on-a-chip (SOC) designs contain a device called a direct memory access (DMA) controller. The purpose of DMA is to efficiently move blocks of data from one location in memory to another. DMA controllers are usually used to move data between system memory and an input/output (I/O) device, but are also used to move data between one region in system memory and another. A DMA controller is called “direct” because a processor is not involved in moving the data.
Without a DMA controller, data blocks may be moved by having a processor copy data piece-by-piece from one memory space to another under software control. This usually is not preferable for large blocks of data. When a processor copies large blocks of data piece-by-piece, it is slow because the processor does not have large memory buffers and must move data in small inefficient sizes, such as 32-bits at a time. Also, while the processor is doing the copy, it is not free to do other work. Therefore, the processor is tied up until the move is completed. It is more efficient to offload these data block moves to a DMA controller, which can do them much faster and in parallel with other work.
DMA controllers usually have multiple “channels.” As used herein, a “channel” is an independent stream of data to be moved by the DMA controller. Thus, DMA controllers may be programmed to perform several block moves on different channels simultaneously, allowing the DMA device to transfer data to or from several I/O devices at the same time.
Another feature that is typical of DMA controllers is a scatter/gather operation. A scatter/gather operation is one in which the DMA controller does not need to be programmed by the processor for each block of data to be moved from some source to some destination. Rather, the processor sets up a descriptor table or descriptor linked list in system memory. A descriptor table or linked list is a set of descriptors. Each descriptor describes a data block move, including source address, destination address, and number of bytes to transfer. Non-scatter/gather block moves, which are programmed via the DMA registers directly, are referred to as “single programming” DMA block moves.
A linked list architecture of a DMA controller is more flexible and dynamic than the table architecture. In the linked list architecture, the processor refers one of the DMA channels to the first descriptor in the chain, and each descriptor in the linked list contains a pointer to the next descriptor in memory. The descriptors may be anywhere in system memory, and the processor may add onto the list dynamically as the transfers occur. The DMA controller automatically traverses the table or list and executes the data block moves described by each descriptor until the end of the table or list is reached.
A DMA device may be architected to have an appropriate number of buffers and to handle an appropriate number of simultaneous outstanding transactions so that the high latency path to the data will not cause meaningful stalls in the data transfers. A typical DMA programming model is to chain data transfers together as a list of scatter/gather descriptors, as described above. These descriptors must be fetched from memory. If, in this environment, the latency to the descriptor memory is as high as the latency to the data, then a problem may be encountered. There may even be a problem with low latency descriptor fetches with high latency data block fetches, as will be discussed below.
A typical DMA architecture feeds the information fetched in a descriptor directly into the same configuration registers that are loaded by a “single programming” DMA block data move. The request of the next descriptor begins as soon as the DMA device starts the write of the final transaction for the current descriptor. The DMA device must wait until this final write begins, because at that time, the configuration registers that ran the data block move are available again to be loaded. This overlap can reduce, but not avoid, data bus stalls as the DMA device transitions from descriptor to descriptor in a low latency environment. However, this scheme may be a disaster in a high latency environment.