The present invention relates to methods and apparatus for transferring data within a multi-processing system.
Real-time, multimedia applications are becoming increasingly important. These applications require extremely fast processing speeds, such as many thousands of megabits of data per second. While some processing systems employ a single processor to achieve fast processing speeds, others are implemented utilizing multi-processor architectures. In multi-processor systems, a plurality of sub-processors can operate in parallel (or at least in concert) to achieve desired processing results.
In recent years, there has been an insatiable desire for faster computer processing data throughputs because cutting-edge computer applications are becoming more and more complex, and are placing ever increasing demands on processing systems. Graphics applications are among those that place the highest demands on a processing system because they require such vast numbers of data accesses, data computations, and data manipulations in relatively short periods of time to achieve desirable visual results.
In some processing system, direct memory access (DMA) techniques are employed where the computer architecture allows data to be sent directly between a device and a memory without involving any microprocessor(s) in the data transfer. The architecture usually includes a memory controller that receives data transfer commands from the device(s) of the system to cause the transfer of data. A conventional DMA command may specify a data block size, a starting virtual address within the system memory from/to which data are to be transferred, and a start address of the device to/from which data are to be transferred. Although the conventional DMA technique is capable of increasing processing speeds as compared with non-direct memory access techniques, it has limitations. For example, in some computing applications, such as graphics processing using a multi-processing system, many DMA transfers from one or more sub-processors might be necessary to achieve desirable results. The conventional approach would require a given sub-processor to issue many DMA commands to effect all of the DMA data transfers, which places a burden on the processor and reduces processing power.