1. Field of the Invention
The present invention relates to a LSI butterfly processor and method for high performance Fast Fourier Transform (FFT) processors.
2. Description of Related Art
With the present state of the art in VLSI technology, the speed of VLSI computing structures is typically limited by the data bandwidth of a VLSI array processor and not by silicon circuits. The data bandwidth is directly related to the number of input and output (I/0) pins. The level of integration has reached a point where, due to limited I/0 bandwidth, it is not possible to achieve 100% execution hardware utilization without increasing the number of I/0 pins significantly. In such a situation, the design of VLSI processors should aim for minimization of the undesirable effects of limited I/0 bandwidth while preserving the tremendous advantages of large computation arrays.
Early integrated FFT processors were based on a radix-2 "butterfly" as illustrated in FIG. 1a, to keep the amount of hardware required on a single chip to a minimum. A "butterfly" is a set of arithmetic operations commonly used in digital signal processing. Recently, due to advancing technology, several radix-4 butterfly-based processors have been disclosed. These processors are a logical evolutionary advance from radix-2 processors that takes advantage of a higher level of integration and thus more silicon area. Each radix-4 butterfly requires four data inputs, three `twiddle` inputs (i.e. angular velocity information), and four data outputs, as shown in FIG. 1b. A typical multi-cycle radix-4 butterfly processor includes one complex data input port, one complex twiddle port and one complex data output port, where each port can be 32 to 48 bits wide in such fixed point processors. Thus, for a radix-4 butterfly processor there is already a large number of I/0 pins. Also, it is difficult to conceive of a radix-8 butterfly processor because there is a tremendous increase in the amount of hardware required for a radix-8 butterfly processor as compared with a radix-4 butterfly processor. The term "butterfly" processor is used herein to conveniently designate the circuit module, described in detail later herein, which includes a plurality of fan-in inputs and a plurality of fan-out outputs in the schematic representation.
It is desirable to compute Fast Fourier transforms (FFTs) using as high a radix as possible to alleviate the I/0 bandwidth problem in VLSI FFT processors. The use of higher radices is even more desirable for FFT processors designed to handle exceptionally large FFT sizes e.g., 16 million data-point FFT. The I/0 bandwidth problem in high performance FFT processors is illustrated in the timing diagrams of FIGS. 2a and 2b. Let Tio be the I/0 cycle time and Teu be the execution unit pipeline cycle time. For a dual radix-2 butterfly, four I/0 cycles are performed to fetch the data (A1, B1, A2, B2), as shown in FIG. 2a. The datapath then performs two butterfly operations (1,2) on fetched data in two cycles. If Tio is equal to Teu then the datapath cannot be kept busy continuously and is I/0 bound as indicated by the blank cycles in the datapath timing waveform of FIG. 2a. As indicated in FIG. 2a, to make the datapath execution bound, 4.times.Tio=2.times.Teu i.e., the I/0 cycle time should have to be one-half the execution pipeline cycle time. FIG. 2b illustrates the same situation for a radix-4 butterfly. Here again four I/0 cycles are required to fetch the data for a single radix-4 butterfly. If it is assumed that the datapath can execute a radix-4 butterfly every two cycles, then once again Tio would have to be (1/2) Teu to make the datapath execution bound. This then is the I/0 bandwidth problem in high performance pipelined FFT processors with a limited number of I/0 ports, specifically three complex ports (namely, an input port, a twiddle port and an output port as indicated in FIG. 2).