1. Technical Field
The present application relates generally to an improved data processing system and method. More specifically, the present application is directed to a direct memory access controller with a barrier and interrupt mechanism for high latency and out of order direct memory access device.
2. Description of Related Art
Many system-on-a-chip (SOC) designs contain a device called a direct memory access (DMA) controller. The purpose of DMA is to efficiently move blocks of data from one location in memory to another. DMA controllers are usually used to move data between system memory and an input/output (I/O) device, but are also used to move data between one region in system memory and another. A DMA controller is called “direct” because a processor is not involved in moving the data.
Without a DMA controller, data blocks may be moved by having a processor copy data piece-by-piece from one memory space to another under software control. This usually is not preferable for large blocks of data. When a processor copies large blocks of data piece-by-piece, it Is slow because the processor does not have large memory buffers and must move data in small inefficient sizes, such as 32-bits at a time. Also, while the processor is doing the copy, it is not free to do other work. Therefore, the processor is tied up until the move is completed. It is more efficient to offload these data block moves to a DMA controller, which can do them much faster and in parallel with other work.
DMA controllers usually have multiple “channels.” As used herein, a “channel” is an independent stream of data to be moved by the DMA controller. Thus, DMA controllers may be programmed to perform several block moves on different channels simultaneously, allowing the DMA device to transfer data to or from several I/O devices at the same time.
Another feature that is typical of DMA controllers is a scatter/gather operation. A scatter/gather operation is one in which the DMA controller does not need to be programmed by the processor for each block of data to be moved from some source to some destination. Rather, the processor sets up a descriptor table or descriptor linked list in system memory. A descriptor table or linked list is a set of descriptors. Each descriptor describes a data block move, including source address, destination address, and number of bytes to transfer. Non-scatter/gather block moves, which are programmed via the DMA registers directly, are referred to as “single programming” DMA block moves.
A linked list architecture of a DMA controller is more flexible and dynamic than the table architecture. In the linked list architecture, the processor refers one of the DMA channels to the first descriptor in the chain, and each descriptor in the linked list contains a pointer to the next descriptor in memory. The descriptors may be anywhere in system memory, and the processor may add onto the list dynamically as the transfers occur. The DMA controller automatically traverses the table or list and executes the data block moves described by each descriptor until the end of the table or list is reached.
Modern DMA devices may be connected to busses that allow read data to be returned out of order. That is, the DMA controller may issue several read transactions to the bus that are all part of the same or different data block moves and the data may be returned by the target devices in a different order than the order in which the reads were issued. Typically, each read transaction is assigned a “tag” number by the initiator so that when read data comes back from the bus, the initiator will know based on the tag to which transaction the data belongs.
The transaction queued can be completed in any order. This allows the DMA device to achieve the best performance by queuing many transactions to the bus at once, including queuing different transactions to different devices. The read transactions can complete in any order and their associated writes started immediately when the read data arrives. Allowing the reads and their associated writes to compete in any order achieves the best performance possible on a given bus, but can cause certain problems.
When system software sets up a large block of memory to be moved between an I/O device and memory or from one region in memory to another, the software will want to know when that block of data has been completely moved so that it can act on the data. Because the processor or some other device may act on the data when the transfer is complete, it is imperative that the interrupt not be generated until all of the data associated with the move has been transferred; otherwise, the processor may try to act on data that is not yet transferred and will, thus, read incorrect data. With out of order execution, a DMA device cannot simply generate an interrupt when the last transaction in a block completes.
Some systems work by having “completion codes” moved to “mailboxes” when a series of data moves have been completed. A mailbox is a messaging device that acts as a first-in-first-out (FIFO) for messages. When the DMA controller delivers messages to the mailbox by writing to the mailbox address, the DMA controller may deliver messages to the processor in order. Messages are typically small, on the order of eight or sixteen bytes. When software sets up a series of block moves in a scatter/gather list, the software can input the completion messages in the descriptor linked list so that the DMA device may move both the data blocks and the completion code messages via the same list of scatter/gather descriptors.
However, in order for software to work correctly, when the DMA controller writes a completion message to the mailbox, it is imperative that all descriptors prior to the descriptor writing to the mailbox have completed, because the mailbox, like an interrupt, tells the processor that a certain amount of data has been moved. Because all transactions can complete out of order for performance, the DMA device can write a completion message to the mailbox prior to some of the other transactions from previous descriptors having completed unless there is a mechanism to prevent it.