A multiprocessor array is useful in applications in which a task may be divided into individual units of work, and the units of work distributed amongst the processors in the array. One example is the implementation of a Fast Fourier Transform (FFT) for digital signal processing applications. In these types of applications, one of the processors is configured or programmed to operate as the scatter-gather (S-G) processor, and the other processors are programmed to process the units of work (“data path processors” or DPPs).
The scatter operation performed by the S-G processor involves selecting for each processor the unit of work to be processed by that processor and providing the unit of work to that processor. The gather operation, also performed by the S-G processor, involves collecting completed units of work from the processors and assembling a data set that represents the completed task. In many applications, each unit of work provided to a processor is a subset of an input data set. Thus, the scattering of units of work often involves moving data to local memories or caches of the processors.
For each of the DPPs, the S-G processor reads from a memory the required subset of data and writes that data to a memory resource local to the DPP. Once a DPP has completed its unit of work, the S-G processor reads the completed data from the DPP's local memory and writes the necessary data to the S-G processor's local memory. Thus, the S-G processor may be involved with two read operations and two write operations for each DPP involved in processing the task. These read and write operations may consume considerable resources of the S-G processor and prevent the S-G processor from suitably performing other processing tasks.
The present invention may address one or more of the above issues.