The present invention relates generally to improvements in array processing, and more particularly to advantageous techniques for providing improved methods and apparatus for data distribution to and collection from multiple memories often associated with and local to processing elements within an array processor.
Various prior art techniques exist for the transfer of data between system memories or between system memories and input/output (I/O) devices. FIG. 1 shows a conventional data processing system 100 comprising a processor local memory 110, a host uniprocessor 120, I/O devices 130 and 140, system memory 150 which is usually a larger memory store with longer access delay than the processor local memory, and a direct memory access (DMA) controller 160.
The DMA controller 160 provides a mechanism for transferring data between processor local memory and system memory or I/O devices concurrent with uniprocessor execution. DMA controllers are sometimes referred to as I/O processors or transfer processors in the literature. System performance is improved since the host uniprocessor can perform computations while the DMA controller is transferring new input data to the processor local memory and transferring result data to output devices or the system memory. A data transfer between a source and a destination is typically specified with the following minimum set of parameters: source address, destination address, and number of data elements to transfer. Addresses are interpreted by the system hardware and uniquely specify I/O devices or memory locations from which data must be read or to which data must be written. Sometimes additional parameters are provided such as data element size. One of the limitations of conventional DMA controllers is that address generation capabilities for the data source and data destination are often constrained to be the same. For example, when only a source address, destination address and a transfer count are specified, the implied data access pattern is block-oriented, that is, a sequence of data words from contiguous addresses starting with the source address is copied to a sequence of contiguous addresses starting at the destination address. Array processing presents challenges for data transfer both in terms of addressing flexibility, control and performance. The patterns in which data elements are distributed and collected from PE local memories can significantly affect the overall performance of the processing system. One important application is fast Fourier transform (FFT) processing which uses bit-reversed addressing to reorder the data elements. With the advent of the manifold array (ManArray) architecture, it has been recognized that it will be advantageous to have improved techniques for data transfer which efficiently provide these and other capabilities and which are tailored to this new architecture.
As described in greater detail below, the present invention addresses a variety of advantageous approaches for improved data transfer control within a data processing system. In particular, improved techniques are provided for:
(1) Supporting radix 2, 4 and 8 fast Fourier transform algorithms through efficient data reordering or xe2x80x9cbit-reversed addressingxe2x80x9d across multiple processing elements (PEs), carried out concurrently with FFT computation by a digital signal processor (DSP), and
(2) Parallel data distribution and collection through efficient forms of multicast and xe2x80x9cpacking-gatherxe2x80x9d operations.
These and other aspects and advantages of the present invention will be apparent from the drawings and the Detailed Description which follow.