In conventional symmetric multiprocessor systems, that is, multiprocessors comprising a plurality of main processor units (MPUs), the MPUs have direct access to common shared memory through the employment of load/store instructions. In addition to these load/store instructions, “atomic” read/modify/write capabilities are frequently provided in order to control the synchronization and access to memory shared by programs executing on multiple MPUs. “Atomic” commands can generally be defined as commands which allow data in memory to be read, modified and written as if the sequence were a single operation with respect to other units potentially accessing that data area. This is traditionally done by a hardware sequence that either locks out other unit access to the memory area, until the entire sequence is done, or uses a more primitive load with reservation and conditional store technique. Generally, this is done to ensure that an area of memory is completely updated and consistent before being read or written to by another MPU or I/O unit with access to the memory—that is, the atomic command or update sequence is “finished” with that memory area.
Atomic commands frequently take the form of special instructions, such as “compare and swap,” “test and set,” “fetch and no-op,” “fetch and store,” and so on. An alternative technique is to provide a more fundamental “load and reserve” and “store conditional” instruction pair in an MPU which provides the capability to implement the atomic operation sequences in software. These techniques can work well in a symmetric multiprocessor system consisting of homogeneous MPUs.
In an asymmetric heterogeneous multiprocessor system, the MPUs are arranged in a conventional shared memory style. Specialized processors, attached processor units (APUs), have their own private instruction and data memory which have indirect access to the shared memory through a block move ordered by a direct memory access (DMA) engine. With a plurality of MPUs and APUs employing DMA engines accessing shared memory, as peers, there exists a need to extend an atomic update mechanism to the DMA engines. This is generally done in order to provide a facility to coordinate access to data in the shared memory. In an environment where multiple APUs exist without such a mechanism, using a master/slave approach of the MPUs parceling out work to each APU one at a time, through commands to the DMA engine, results in poor system utilization and efficiency due to idle time in the APUs and the MPU time that is used to assign work to individual APUs.
Therefore, what is needed is a DMA engine that can be employed by APUs to copy data between APU local storage and shared system memory while participating as a peer with other MPUs and APU/DMA engines in atomic updates of shared memory.