1. Field of the Invention
The invention relates to an array processor having a unique architecture for computing a broad range of signal processing, scientific, and engineering problems at ultra-high speed. More particularly, the invention called a Memory-Linked Wavefront Array Processor (MWAP) comprises a computing architecture that provides global asynchronous communication within the processing array and also provides local/data driven asynchronous control of each processing element.
2. Description of the Prior Art
Signal processing today requires high computation rates. In many cases the signal processing algorithm is straightforward, but the data rate and subsequent processing overwhelms existing computers and as a result, one is forced to limited application situations and/or long computation times. In the field of engineering there is also a need for improved computer speed and reduced cost. System simulations in the areas of hydrodynamics, aerodynamics, electromagnetics, chemistry and heat transfer are usually limited by computer speed, memory and cost. As a result, full simulations of basic phenomena are frequently not feasible in engineering design. The problem is twofold, first to increase system computation speed by one or two orders of magnitude, and second to design a system applicable to a multiplicity of problems.
The systolic array, introduced by H. T. Kung (see, H. T. Kung, "Let's Design Algorithms for VLSI Systems", in Prac. Caltech Conf. VLSI, Jan. 1979, pp. 66-90), is an array of processors that are locally connected and operate synchronously on the same global clock. Algorithms are executed in a pulsed (systolic flow) fashion. That is, the network of processors rhythmically compute and pass data through the system.
The systolic array has the properties of modularity, regularity, local interconnection, and highly pipelined, highly synchronized multiprocessing. However, it requires global synchronization. That is, data movement is controlled by a global timing-reference. In order to synchronize the activities in a systolic array, extra delays are often used to ensure correct timing. For large arrays of processors, synchronization of the entire computing network becomes intolerable or even impossible.
In its classic form, examples of which are shown in FIG. 1, the systolic array is not programmable; each algorithm requires a separate and distinct array configuration. With increased complexity, the systolic array can be made "hardware programmable" by using matrix switches to reconfigure the array geometry. In any case, the systolic array always requires a direct mapping of the computation algorithm onto physical processor elements.
A second attempted solution, the wavefront array processor uses the same geometric structures as the systolic array and is generally described in: S. Y. Kung et al, "Wavefront Array Processor: Architecture, Language and Applications", MIT Conf. on Advanced Research in VLSI, Jan. 1982, MIT, Cambridge, MA. It differs from the systolic array in that control flows through the array along with data and parameters. This addition of local control flow to local data-flow permits data-driven, self-timed processing. Conceptually, the requirement of correct "timing" is replaced by the requirement for correct "sequencing".
Every processor element (PE) in a wavefront array processor has a bidirectional buffer with independent status flags for each adjacent PE. The flow of data is asynchronous between PE's with control tokens sent between PE's to determine data availability and data use. This relaxes the strict timing requirement of the systolic array, simplifies algorithm development, and often results in faster algorithms and processing speed. The wavefront processor thus operates by passing control and data between processors in a wavelike fashion so that computation flows from one processor to the next as each processor completes a recursion (step) in the algorithm.
However, both the systolic and wavefront array processors are deficient in that they require local-type communication and can't handle global-type communication. As a result, certain useful algorithms can't be calculated using prior art systolic and wavefront array processors. For example, Fast Fourier Transform, FFT, is calculated using the following recursion formula (the decimation-in-time constant geometry FFT algorithm): EQU x(m+1,p)=x(m,p)+W(k,N)*x(m,q) EQU x(m+1,q)=x(m,p)-W(k,N)*x(m,q)
with p and q varying from stage to stage. Calculation of this algorithm requires global-communication since the distance .vertline.p-q.vertline. between data points increases from stage to stage. But, systolic and prior art wavefront array processors require the distance between data items to remain constant from processor (stage) to processor. Thus a systolic or wavefront array processor could not be used to calculate FFT using the above recursion formula.
Similarly, the computing capability and flexibility of the prior art array processors is limited because of the requirement that data must pass between the processing elements in the order in which that data is to be used by the receiving processing element. This deficiency in the prior art makes the calculation of certain algorithms difficult and cumbersome.