1. Field of the Invention
The present invention relates to a pipelined FFT (fast Fourier transform) construction, and more particularly, to an improved FFT processor having a CBFP (convergent block floating point) algorithm.
2. Description of the Background Art
A fast Fourier transform is one of the most significant algorithms in a DSP (digital signal processing) field and it is a general term representing DFT (discrete Fourier transform).
The FFT algorithm is implemented in integrated circuits of one or more physical devices so as to process a signal at real time. The fast Fourier transform operation is performed by a software implemented in a programmable DSP or by an FFT-exclusive processor. The most significant part in the FFT processor hardware system is a butterfly processor performing arithmetic operation. A FFT butterfly calculation is implemented by a xcex3-point data operation. Here, xcex3 refers to radix.
N-point FFT employs N/xcex3 butterfly units per stage (block) for logxcex3 N stage (hereinafter, xe2x80x9cstagexe2x80x9d). At this time, the operation result of a single butterfly stage is applied to a subsequent butterfly stage.
With regard to an N-point direct DFT (discrete Fourier transform), a basic equation is as follows.                                           X            ⁡                          (              K              )                                =                                    ∑                              n                =                0                                            N                -                1                                      ⁢                                          x                ⁡                                  (                  n                  )                                            ⁢                              W                N                nk                                                    ,                  K          =          0                ,        1        ,        …        ⁢                  xe2x80x83                ,                  N          -          1                                    (        1        )            
wherein, n denotes time index, k denotes frequency index, N denotes point, and WN(=exe2x88x92j(2xcfx80/N)) denotes twiddle factor.
FIG. 1 shows a basic construction of radix-2 butterfly unit expressing equation 1 by butterfly. The relations between input and output are as follows.
X[k]=x[n]+x[n+N/2]WNk
X[k+N/2]=x[n]xe2x88x92x[n+N/2]WNk
FIG. 2 is a signal flow chart illustrating a 16-point radix-2 FFT processor. The butterfly operation of the 16-point FFT is implemented by 4 butterfly stages (blocks) I, II, III, IV and each stage includes 8 butterflies.
Also, FIG. 3 is a signal flow chart illustrating a radix-4 butterfly unit implementing equation 1 by butterfly, and FIG. 4 is a signal flow chart illustrating a 16-point radix-4 FFT processor. The butterfly calculation of the 16-point FFT is performed by 2 butterfly stages and each stage includes 4 butterflies.
Such a butterfly operation using a Cooley-Tukey algorithm uses a xe2x80x9cdrive and conquerxe2x80x9d method so that the calculation process can be decreased to N log N. However, when implementing the same in hardware, it becomes difficult to apply thereto a flexibility, regularity and in-place computation.
FIG. 5 is a schematic block diagram illustrating a conventional FFT processor having a single butterfly unit. As shown therein, RAM 10 serves to relocate an input data Data_in, RAM 13 stores therein an operation result of a butterfly unit 11. The RAMs 10, 13 respectively include an N-word RAM. The butterfly unit 11 includes complex number multipliers (4 multipliers and 2 adders) and 4 adders. ROM 12 stores therein a twiddle element WNk, and a controller 14 controls access operation of the RAMs 10, 13 and the butterfly operation of the butterfly unit 11.
Therefore, the butterfly unit 11 employs the access data of the RAM and the twiddle element read from the ROM 12 to perform a butterfly operation of N-point FFT, and the operation result is temporarily stored in the RAM 13. When all the butterfly operation is completed, the final output data Data_out is outputted from the RAM 13 in accordance with the control of the controller 14.
Here, although the conventional FFT processor is appropriate to a small point FFT operation, it is not suitable to a large point FFT calculation. This is because there are required number (N/xcex3)logxcex3 N of radix-xcex3 butterfly units for N-point FFT calculation and number 2N of RAMs for storing the intermediate data as a major factor determining a chip area during FFT processor fabrication. Also, there is required 2N logxcex3 N times of read/write access with regard to number 2N of RAMs. Accordingly, during the large point FFT fabrication and calculation the conventional FFT processor leads to a speed decrease and an area increase and in a worse case it may be impossible to realize a hardware implementation.
FIG. 6 is a schematic block diagram illustrating a conventional pipelined FFT processor for a radix-4 butterfly operation as disclosed in U.S. Pat. No. 5,163,017.
The pipelined FFT processor includes a RAM 22 for appropriately relocating input data in correspondence to a butterfly operation, a controller 20 for controlling the RAM 22 and an address generator 21, a coefficient ROM 24 for storing therein the twiddle element, a coefficient address generator 23 for controlling the coefficient ROM 24 and a pipelined data path block 25 for performing a butterfly operation. The pipelined FFT structure is provided such that a single butterfly calculation is implemented in a single pipelined cycle.
The operation of the thusly constituted conventional pipelined FFT processor will now be described.
The address generator 21 outputs a read/write address signal in accordance with a control signal from the controller 20, and the coefficient address generator 23 outputs the coefficient address signal to the coefficient ROM 24 so as to read the twiddle coefficient.
The RAM 22 relocates the input data Data_in in accordance with the write address signal of the address generator 23 and outputs 4 data to the pipelined data path block 25. The coefficient ROM 24 outputs two twiddle coefficients for the radix-4 butterfly operation in accordance with the coefficient address signal from the coefficient address generator 23. Here, the coefficient ROM includes two storage ROMs so as to simultaneously read two twiddle coefficients and three ROMs may be employed in case of simultaneously reading three twiddle coefficients.
Therefore, the pipelined data path block 25 employs the data accessed from the RAM 22 and two twiddle coefficients read from the coefficient ROM 24 and implements a butterfly operation with the provision of 16 addition/subtractions and 3 complex number multiplications. At this time, 3 complex number twiddle coefficients read from the coefficient ROM 24 are employed for the complex number multiplication and the output data Data_in generated from the respective butterfly operations are stored in the RAM 22.
As described above, the conventional pipelined FFT processor partially stores the input data Data_in and the output data Data_out in the RAM 22 so as to implement the butterfly operation. Accordingly, the above structure has an advantage for significantly saving memory (RAM) required to store the intermediate data when compared to the FFT processor as shown in FIG. 5.
However, the conventional FFT processor additionally requires the RAM 22 and the address generator 21 to relocate the input data Data_in and includes a complicated pipelined data path block 25 for enabling the butterfly operation. Here, the detailed description of the pipelined datapath block 25 will be omitted for convenience"" sake. Also, eight pipelines should be passed in order to implement a single butterfly in the datapath block 25 so that there disadvantageously occur a plurality of pipeline delays.
A block floating point algorithm advantageously processes a block data at high speed and it is widely employed in butterfly operation. Since a general butterfly processor includes fixed-point multipliers and adders, a data range increases in accordance with the operation of multiplication, addition and subtraction, thereby generating an overflow. Accordingly, the overflow should be detected in order to appropriately shift the overflowed data.
FIG. 7 is a schematic view illustrating a conventional block floating point mechanism. As shown therein, the block floating point mechanism includes a shifter 30, a butterfly processor 31 connected to the shifter 30 and an overflow detector 32 connected to the shifter 30 and the butterfly processor 31.
The shifter 30 receives a source data for operation from a memory (not shown). At this time, the source data for a butterfly operation of a first stage (block) is not shifted in the shifter 30 and instead transmitted directly to the butterfly processor 31. The butterfly processor 31 receives the source data and implements the butterfly operation, and the overflow detector 32 detects the overflow from the result data of the butterfly processor 31. Then, the last butterfly operation is completed, and when the overflow of the last result data is detected, the largest overflow bit number M1 with regard to the first stage is applied from the overflow detector 32 to the shifter 30. The final result data of the butterfly operation at the first stage is transmitted to the memory so as to be employed as source data for the butterfly operation at the second stage (block).
Also, the shifter 30 receives the source data from memory for the butterfly operation of the second stage and shifts the received source data as much as the overflow bit number M1. The shifted data is sent to the butterfly processor 31 for butterfly operation, and the result data of the butterfly processor 31 for obtaining the largest overflow bit number M2 to be provided to the shifter 30 is detected by the overflow detector 32. The result data of the second stage butterfly operation is transmitted to the memory and serves as source data for a second stage butterfly operation. Those steps are repeated until all the stage butterfly operations are completed.
As described above, the conventional BTP (block floating point) mechanism introduces a processing method with regard to bits overflow by a multiplication and addition during a fixed point operation. In the BFP mechanism, the overflow of all the data in one block (stage) is examined so that all the other data is shifted as much as Mk with the reference having a largest overflow Mk so as to compensate for an overflow error which occurs in accordance with a calculation result of the butterfly. Also, the BFP construction has an advantage in that the butterfly operations of Kxe2x88x921 stage and K stage are instantly connected for thereby without requiring a pipeline waiting.
However, although the block floating point mechanism heightens accuracy of the FFT operation, when the block is large as the FFT operation of a large point (multi-step butterfly operation), the accuracy does not show a significant improvement.
Therefore, it is an object of the present invention to provide an FFT processor which enables effective management of an operation time while decreasing chip area, thereby implementing an improved pipelined mechanism.
It is another object of the present invention to provide a multi-functional FFT processor capable of implementing 2K-point butterfly operation and 8K-point butterfly operation.
It is further another object of the present invention to provide an FFT processor capable of implementing an FFT/reverse FFT operation.
It is still further another object of the present invention to provide an FFT processor capable of improving an output data reliability by implementing a convergent block floating point circuit using a floating point concept with regard to a fixed point data operation.
To achieve the above-described objects, there is provided a pipelined FFT (fast Fourier transform) processor including a CBFP (convergent block floating point) algorithm according to the present invention which includes an inverse multiplexer for inverse-multiplexing an 8K-/2K-point input data, a first to sixth radix-4 operation circuit for receiving an output of the inverse multiplexer and performing a butterfly operation, a multiplexer connected between the first and second radix-4 operation circuits and for selectively outputting an output of the inverse multiplier or a first butterfly unit, a radix-2 operation circuit connected to the sixth radix-4 operation circuit and for performing a butterfly operation, a convergent block floating point circuit connected to respective output terminals of the radix-4 operation circuit and the radix-2 operation circuit and for scaling a butterfly operation result, an addition circuit for accumulation and adding scaling indexes outputted from the convergent block floating point circuit, and a decoder for scaling an output of the radix-2 operation circuit in accordance with the scaling indexes outputted form the addition circuit.
The features and advantages of the present invention will become more readily apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific example, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.