Direct memory access or “DMA” refers to concept of performing data transfer without the involvement of the processor or CPU. In response to certain stimuli or commands, a direct memory access device can move data from one memory location or region to another location or region.
As a general matter, general purpose processors are not designed to be efficient in simply transferring data from one memory location to another. It can therefore waste precious processor cycles for a general purpose processor to repeatedly load data and store data to another location, even though a processor can perform these memory access related tasks. In most cases, the use of direct memory access control can free the processor from performing repeated loads and stores, which allows a processor's cycle time to be used for more meaningful processing tasks.
DMA controllers in general are devices that are designed to do repeated memory loads and stores only, but to do so efficiently. More recently, DMA controllers are now being embedded into input/output (“IO”) devices that handle large data transfers, such as network controllers and disk controllers. These embedded DMA controllers transfer the data to and from the memory, from and to the network, disk drive, or wherever the data should be moved. In this manner, the processor will only be disturbed by the notification of an “interrupt” indicating the data is now ready or was sent to the desired location.
While many IO devices now include embedded DMA controllers, many legacy IO devices do not. In such cases, legacy IO devices can require that the processor itself copy data by doing loads and stores. Such IO devices are generally known as PIO (Programmed Input/Output) devices, which require each step of input and output data transfer to be programmed. Typically, slower devices like modems and printers, which do not directly interface with the processor, may be attached to the computer system via serial ports (UARTS) and parallel ports, which are PIO devices. PIO devices can require the CPU to move data to or from the device as each byte is ready, by responding to an interrupt or polling. Thus, in many cases, these PIO devices require certain state checking and waits or other sequential register accesses to operate.
General purpose DMA controllers are designed for efficiency of data transfer. But no matter how efficient the transfer of data may be, slow IO devices are still slow, and there is a need to perform the chores of complicated data access sequences to and from IO devices but in a manner to allow the processor to be freed up from these tasks. Further, IO devices can require strict ordering (sequential access) and even devices with embedded DMA capability may require strictly ordered access to its registers.
Memory ordering or consistency, that is, the way a processor reads the result of a memory write, is an important concept, especially for multi-processor systems. Many forms of memory consistency exist, such as strict ordering (e.g., sequential consistency) and loose ordering (e.g., release consistency). To ensure strict ordering, program order and write atomicity must exist. With program order, previous memory operations are completed before beginning other memory operations. With write atomicity, where more than one copy of data exists, such as with cache-based systems, writes to memory must be visible in the same order to all processors. Additionally, updated memory values after a write operation are not returned to a read before all updates or invalidations of the data are acknowledged.
Loose ordering (or release consistency) refers to a weakly ordered classification of memory operations into data and synchronization operations where program order is enforced by allowing any operations between two synchronization operations to be reordered. The synchronization operation consists of acquired operations and release operations. Release operations are write operations that grant permission to a shared memory location. Acquire operations are read operations that access shared memory locations. Release operations ensure that memory accesses before the operation have been completed while acquire operations require all prior memory accesses to be complete before the operation completes. System busses may be designed with loose ordering to allow multiple transactions to be completed out of order. This can enable better utilization of the bus bandwidth. However, transactions are likely to have different latencies. A read requires the results to be returned from the target, whereas a write can be considered completed as soon as it is issued. But the actual completion of the write requires that the value be written in the target address so that it will return that value when read back.
Main processors typically operate at high frequencies according to the release consistency or loose ordering memory access model. This model generally causes memory access to be performed out of order. Given the speed of the processor, responses to interrupts result in long stall times.
With cache memory systems, when applications on a computer system begin, instructions and data are moved from hard disk into main memory so that a processor can access the data and instructions more quickly. Dynamic random access memory (DRAM) generally is the main memory which serves as the cache memory for the hard disk. When a processor locates data in one of its cache memories, it is referred to as a “hit.” There may be many levels of cache memory, some located on the processor or separate from the processor. A failure to locate data in one of the cache locations is called a “miss.” Each miss introduces a delay or latency. In connection with the use of a high speed processor, a long cache miss latency is associated with processor interrupt responses. Most high performance systems are designed to execute memory accesses out-of-order in an effort to achieve maximum bandwidth. However, with DMA engines, IO devices require in-order accesses. In other words, such DMA engines do not allow accesses to different IO devices while a first processor runs to access a first IO device. A need therefore exists for an IO direct memory access method and device that provides improved, in-order access to IO devices when used with a high performance processor system.
Although a loosely ordered memory model and high frequency processors are often combined together to provide high performance, these are really two different things. For such a combined system, the high processor frequency means that any time the processor spends waiting or idle translates to more processor cycles being wasted. In a loosely ordered memory system, strictly ordered accesses are special cases, and they are generally slower. But since IO device access in such systems requires many of these strictly ordered accesses, there is also a need to be able to performed ordered IO device access for each of the devices that such a system targets to control.