FFT transformation is widely used in the fields of image processing, wireless communication, voice recognition, spectrum analysis, radar processing, remote sensing and measuring, geological exploration and so on. In one-dimensional signal processing, various high-effective FFT algorithms have been used. To process points of the integral power of two, algorithms such as a Cooley-Tukey algorithm, a Sander-Tukey algorithm, a split radix algorithm can be used. To process points of the non-integer power of two, algorithms such as a Good-Thomas prime factor algorithm, a Winograd nested algorithm can be used. In two-dimensional signal processing, a row-column decomposition algorithm and a vector radix algorithm can be used.
The vector radix algorithm processes a two-dimensional array as a one-dimensional sequence, accesses butterfly operation numbers in parallel by dividing NλN point data into multiple independent and small point units, and thus facilitating 2D-FFT by using a vector radix 2×2 butterfly operation hardware unit only. Compared with a conventional row-column decomposition algorithm, the vector radix algorithm significantly saves hardware resources, and improves a calculation speed. However, although the algorithm can effectively reduce computation in processing small-scale data arrays, as it handles large-scale data arrays, such as a 8192×8192 two-dimensional FFT calculation, a data control process becomes complex, and requirement for a on-chip memory is extremely high.
Compared with the vector radix algorithm, the row-column decomposition algorithm is more widely used and mature. For the row-column decomposition algorithm, FFT calculation is conducted on each row of the two-dimensional array, so as to obtain an intermediate result, then FFT calculation is conducted on each column of the two-dimensional array, so as to obtain a final two-dimensional FFT result. The algorithm divides FFT calculation of the two-dimensional array into multiple times of one-dimensional FFT calculation, which simplifies a data flow diagram, reduces capacity of on-chip memories, and makes it possible to design VLSI and reduce area of a silicon wafer. The present invention employs the row-column decomposition algorithm.
Row transposition of a large-scale data array is mainly implemented by two methods: the first one is to use a corner turning memory to conduct row and column transformation on a processing module for saving a two-dimensional array after row transformation, and to change a read mode of the memory; the second one is to write a two-dimensional array after row transformation into the memory according to transposed addresses, and to sequentially read out the array.
Methods for implementing image transposition by the corner turning memory mainly comprise an input/output balancing method, a block matrix method, a row-in and column-out method, and a two-frame or three-frame corner turning algorithm, amongst the row-in and column-out method and the block matrix method are two methods for transposing FFT row transformation results. The row-in and column-out method sequentially stores data after two-dimensional row transformation into an off-chip SDRAM in a direction of a row, and continuously reads a series of data in a direction of a column. However, if the worst situation where every time data on a column is read, it is required to conduct row activation thereon, which takes several clock cycles, and may reduce an efficiency of accessing the off-chip SDRAM. The block matrix method is slightly better than the row-in and column-out method, it stores segmented two-dimensional data into the off-chip SDRAM in rows, namely storing one row in the two-dimensional data into a regional block of the off-chip SDRAM. However, since cross-line reading often take place, an efficiency of accessing the off-chip SDRAM is comparatively low.