A number of commonly used digital signal processing (DSP) algorithms, including finite impulse response (FIR) filters, correlation, autocorrelation, and convolution, are computationally similar and may be characterized as "sliding window" algorithms. These algorithms are referred to as "sliding window" algorithms because of the manner in which they access data. The basic data movement pattern of a "sliding window" algorithm is illustrated in FIG. 1. In FIG. 1, a long vector of data 102 needs to be processed by a sliding window algorithm.
However, each step of the algorithm uses only a subvector, or "window", 104 of the full vector 102. The window 104 "slides" across the vector 102 as the algorithm progresses, shifting one place for each algorithm step. For example, in FIG. 1, the window 104 slides in an upward direction across the vector 102.
A good example of a sliding window algorithm is a FIR filter characterized by the following equation. ##EQU1##
The algorithm may be divided into steps, where each step processes a "window" of the input vector, x, and where each window is shifted one position relative to the window of the previous step. For example, each value of index i in the summation above might correspond to one step, and the N+1 values of x[n-i] used during each step would constitute the window as follows:
______________________________________ step window ______________________________________ i = 0 x[0], x[1], . . ., x[N] i = 1 x[-1], x[0], . . ., x[N-1] i = 2 x[-2], x[-1], . . ., x[N-2] i = 3 x[-3], x[-2], . . ., x[N-3] . . . . . . . . . I = T x[-T], x[1-T], . . ., x[N-T] ______________________________________
Algorithms of this type are widely used in digital signal processing, image processing, pattern recognition, and other applications. Consequently, for applications of this type, a SIMD vector processor needs an efficient mechanism for handling "sliding window" algorithms.
One prior art approach to efficiently handling these algorithms has been to avoid the "sliding window" problem by avoiding PE-to-PE data movement during computation by replicating the data within the Vector Data Memory (VDM) such that every PE has all the data it needs in its own slice of VDM. Once the required data is arranged properly in VDM, this method allows multiply-and-accumulate (MAC) oriented computation to proceed rapidly. A principal drawback of this method is that VDM space is wasted since data is replicated for multiple PEs. For example, a machine with PEs might require copies of the same data within the VDM in order to give each PE its own readily accessible copy. Another drawback of this implementation is that significant execution time and vector addressing may be required to initially replicate the data before MAC oriented computation begins.
Additionally, many of the prior art SIMD machines have had some mechanism for moving data among PEs. For example, some machines have supported basic "nearest neighbor" communication among PEs, allowing data to be rotated or shifted among PEs. Others have used more complex interconnection networks, such as a shuffle exchange network, to handle more complex data movement patterns. In either case, the typical prior art PE-to-PE communication mechanism permutes the components of a vector whose length equals the number of PEs. A disadvantage of the data movement mechanisms in these prior art architectures is that, for each step of a "sliding window" algorithm, multiple data movement operations may be required to move data to the proper PE locations. Consequently, these data movement operations can consume a large portion of execution time, resulting in inefficient utilization of MAC hardware and reduced overall system performance.
Therefore, a method of providing data to PEs in a timely and memory efficient manner would be desirable.