In conventional symmetric multiprocessor systems, multiprocessors comprising a plurality of main processor units (MPUs) have direct access to common shared memory through the employment of load/store instructions. In an asymmetric multiprocessor environment, MPUs are arranged in a conventional shared memory style. Specialized, or attached processor units (APUs) having their own private instruction and data memory are also present. However, APUs only have indirect access to system memory through a “block” move direct memory access (DMA) controller. This block move DMA controller can transfer data between system memory and the private instruction and data memory (“local store”) of the APU when programmed to do so by software executing in the APU.
In conventional systems having third party DMA controllers, each relevant device in the heterogeneous system is assigned a DMA channel to utilize. This channel is then used by software to effect DMA transfers between system memory and the device. The DMA channel can typically be programmed for a single DMA operation at a time. Unlike third party DMA controllers, first party DMA controllers, wherein the device issues its own DMA commands (as a master), typically utilize a DMA command list placed in system memory by the MPU program. The device then utilizes this list in system memory to determine the DMA operations to perform in conjunction with the commands that it is executing. Normally, there are command status words in memory that the device updates, based upon the success or failure of the command and its associated DMA operation.
DMA commands can be placed in a DMA “queue” directly by the program executing on the APU. In order to provide high performance and efficiency in a heterogeneous system having attached processors with private local stores, it is typically advantageous for the APU program to overlap and coordinate program execution with the DMA controller. To do this efficiently generally requires the capability for the APU to queue up multiple DMA commands and delay checking on the status of the DMA commands until there is a dependency on the DMA command completing with respect to program execution.
Conventional heterogeneous systems do not provide a solution for an environment wherein the APU has direct access to its own private storage, but indirect access through DMA to system memory, yet is of very high performance. For efficiency, a significant number of DMA commands should be queued and executed in parallel to prevent the APU program from continually being delayed, waiting for data transfers to complete.
Therefore, what is needed is a method of determining the status of previously issued asynchronous DMA commands to allow for efficient data movement and program execution synchronization.