1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a direct memory access controller with support for high latency devices.
2. Description of Related Art
Many system-on-a-chip (SOC) designs contain a device called a direct memory access (DMA) controller. The purpose of DMA is to efficiently move blocks of data from one location in memory to another. DMA controllers are usually used to move data between system memory and an input/output (I/O) device, but are also used to move data between one region in system memory and another. A DMA controller is called “direct” because a processor is not involved in moving the data.
Without a DMA controller, data blocks may be moved by having a processor copy data piece-by-piece from one memory space to another under software control. This usually is not preferable for large blocks of data. Having a processor copy large blocks of data piece-by-piece is slow, because the processor does not have large memory buffers and must move data in small inefficient sizes, such as 32-bits at a time. Also, while the processor is doing the copy, it is not free to do other work. Therefore, the processor is tied up until the move is completed. It is far better to offload these data block moves to a DMA controller, which can do them much faster and in parallel with other work.
DMA controllers usually have multiple “channels.” Thus, DMA controllers may be programmed to perform several block moves on different channels simultaneously, allowing the DMA device to transfer data to or from several I/O devices at the same time.
Another feature that is typical of DMA controllers is a “scatter/gather” feature. When executing a scatter/gather operation, the DMA controller does not need to be programmed by the processor for each block. Rather, the processor sets up a “descriptor table” or “descriptor linked list” in memory, depending on the scatter/gather architecture. Each descriptor describes a data block move, including source address, destination address, and number of bytes to transfer. Non-scatter/gather block moves, which are programmed via the DMA registers directly, are referred to as “single programming” DMA block moves.
The linked list architecture is flexible and dynamic compared to the table architecture. In the linked list architecture, the processor refers one of the DMA channels to the first descriptor in the chain, and each descriptor in the linked list contains a pointer to the next descriptor in memory. The descriptors may be anywhere in memory, and the processor may add onto the list dynamically as the transfers are occurring. The DMA controller automatically traverses the table or list and executes the data block moves described by each descriptor until the end of the table or list is reached.
High latency devices present unique challenges if high bus utilization is desired. Each bus has a maximum sustained bandwidth that can be achieved if it is transferring data most of the time without gaps or stalls. When talking to a high latency device, there must be enough simultaneous transactions outstanding so that the time it takes to receive data from the high latency device is less than or equal to the amount of time it takes to transfer the data from all of the other outstanding transactions queued ahead of it. If this criterion is met, then there seldom will be gaps or stalls on the bus where the DMA is waiting for data and does not have any other data available to transfer.
Many busses and interconnects are not able to process 16 or more outstanding read transactions per master, but most busses do have a simple “retry” mechanism. Retry is a response given by the target that tells the master to repeat the same transaction at a later time because the target cannot complete the transaction at that time.
A “delayed read” is a method that high latency targets use to improve bus utilization by not stalling the bus while they are fetching high latency data. A target typically has the option to insert wait states into a transaction to delay the completion of that transaction until it can return the data. Wait states are very inefficient for high latency targets because such targets have to insert many wait states per transaction, and while the bus is in wait states, no other transactions from any masters to any targets can run. The bus is stalled. The delayed read avoids stalling the bus for high latency reads.
A delayed read works as follows. When a high latency target receives a new read request, it will retry that request and at the same time queue the transaction and begin processing it. The retry immediately frees up the bus and allows the bus arbiter to allow the next transaction from the same or a different master to run. When the target has queued a read, that read becomes pending. If the master comes back with the repeated transaction before the target has the data available for return, the target retries the transaction again and the read remains pending.
When the master comes back with the repeated transaction and the target does have the requested data queued and ready to return, the target accepts the transaction and immediately returns the data. Delayed reads allow the target to queue as many transactions as it is designed to handle simultaneously regardless of how many outstanding transactions the bus supports. The bus does not have any knowledge of how many transactions are queued inside the target. Thus, delayed reads are a good way to queue lots of simultaneous transactions on any given bus.