1. Field of the Invention
The present invention relates to a fast Fourier transform (FFT) operating apparatus and an operation method thereof. More particularly, in a programmable processor used with a variety of standards and enabling processing of high speed telecommunication algorithms in a real-time basis and also guaranteeing flexibility in system design, the present invention relates to an FFT operating apparatus and a method thereof for carrying out FFT operation which is the kernel function of DMT (Discrete MultiTone) and OFDM (Orthogonal Frequency Division Multiplexing) modems.
2. Description of the Related Art
Generally, fast Fourier transform (FFT) are used in a variety of fields of communication systems such as with an asymmetric digital subscriber line (ADSL), wireless asynchronous transfer mode (ATM), a short distance wireless communication network, and applications such as a matched filter, spectrum analysis, and a radar. The FFT is required for the establishment of OFDM, i.e., the next-generation high speed telecommunication algorithm. The FFT is the algorithm that transforms a signal in a time domain into a signal in a frequency domain. Since the FFT significantly reduces the number of operations required for a Discrete Fourier Transform (DFT) by using the periodicity of trigonometric functions, operations are carried out with increased efficiency. The DFT is expressed by the following formula 1:
                                                                        X                ⁡                                  (                  k                  )                                            =                                                ∑                                      n                    =                    0                                                        N                    -                    1                                                  ⁢                                                      x                    ⁡                                          (                      n                      )                                                        ⁢                                                                          ⁢                                      w                    N                                          k                      ⁢                                                                                          ⁢                      n                                                                                                                                                              k                =                0                            ,              1              ,              …              ⁢                                                          ,                              N                -                1                                                                                                        w                N                kn                            =                              ⅇ                                                      -                    j                                    ⁢                                                                          ⁢                  2                  ⁢                                                                          ⁢                  π                  ⁢                                                                          ⁢                                      nk                    /                    N                                                                                                          [                  Formula          ⁢                                          ⁢          1                ]            
By re-arranging x(n) in formula 1 into odd-numbered and even-numbered samples, respectively, N-point DFT is divided into two N/2-point DFTs and expressed as the following formula 2:
                                                                        X                ⁡                                  (                  k                  )                                            =                                                ∑                                      n                    =                    0                                                        N                    -                    1                                                  ⁢                                                      x                    ⁡                                          (                      n                      )                                                        ⁢                                      w                    N                    nk                                                                                                                          =                                                                    ∑                                                                  n                        =                        0                                            ,                      even                                                              N                      -                      1                                                        ⁢                                                            x                      ⁡                                              (                        n                        )                                                              ⁢                                                                                  ⁢                                          w                      N                      nk                                                                      +                                                      ∑                                                                  n                        =                        0                                            ,                      odd                                                              N                      -                      1                                                        ⁢                                                            x                      ⁡                                              (                        n                        )                                                              ⁢                                          w                      N                      nk                                                                                                                                              =                                                                    ∑                                          l                      =                      0                                                                                      N                        /                        2                                            -                      1                                                        ⁢                                                            x                      ⁡                                              (                                                  2                          ⁢                          l                                                )                                                              ⁢                                          w                      N                                              2                        ⁢                        lk                                                                                            +                                                      ∑                                          l                      =                      0                                                                                      N                        /                        2                                            -                      1                                                        ⁢                                                            x                      ⁡                                              (                                                                              2                            ⁢                            l                                                    +                          1                                                )                                                              ⁢                                          w                      N                                                                        (                                                                                    2                              ⁢                              l                                                        +                            1                                                    )                                                ⁢                        k                                                                                                                                                                    =                                                                    ∑                                          n                      =                      0                                                                                      N                        /                        2                                            -                      1                                                        ⁢                                                            x                      ⁡                                              (                                                  2                          ⁢                          n                                                )                                                              ⁢                                          w                                              N                        /                        2                                            nk                                                                      +                                                      ∑                                          n                      =                      0                                                                                      N                        /                        2                                            -                      1                                                        ⁢                                                            x                      ⁡                                              (                                                                              2                            ⁢                            n                                                    +                          1                                                )                                                              ⁢                                                                                  ⁢                                          w                                              N                        /                        2                                                                                              (                                                                                    2                              ⁢                              n                                                        +                            1                                                    )                                                ⁢                        k                                                                                                                                                    [                  Formula          ⁢                                          ⁢          2                ]            
As formula 2 is repeated, the N-point DFT is divided into several 2-point DFTs, and this process is referred to as radix-2 DIT (Decimation-in-Time) FFT.
Among methods used to split the DFT of formula 1, radix-2 and radix-4 DIT FFTs are the methods most frequently used.
The radix-2 DIT FFT is split into odd-numbered and even-numbered samples as in formula 2, while the radix-4 DIT FFT is split into four sets. In a comparison of these two FFTs, the radix-2 DIT FFT has a simpler butterfly structure, and thus requires fewer multipliers and less area. However, the number of stages increases in the radix-2 DIT FFT, and thus it uses many more operation cycles than the radix-4 DIT FFT. The radix-4 DIT FFT also permits high speed processing, but it has a complicated butterfly structure and increases the number of multipliers. Also, calculations for butterfly input data addresses are complicated and difficult to implement. Additionally, as the FFT having 4n length performed, the radix-4 DIT FFT has to be used in combination with the radix-2 DIT FFT for the FFT having a 2n length.
Further, the FFT is divided into DIT (Decimation-In-Time) FFT and DIF (Decimation-In-Frequency) FFT according to whether the division operation is based on a time domain or a frequency domain. Formula 2, which is divided with respect to the time domain, is categorized as a DIT FFT. If the division operation is performed with respect to X(k) in the frequency domain, the FFT is a DIF FFT.
In a digital signal processor, the DIT FFT is usually used as the FFT. While the DIF FFT performs addition/subtraction and then multiplication, the DIT FFT, as shown in FIG. 1, performs multiplication and then addition/subtraction. Accordingly, for a digital signal processor based on a multiplier-accumulator, the DIT FFT is more suitable for operations.
For example, a DSP 56600 core is a fixed-point digital signal processor that consists of one 16×16 multiplier-accumulator (MAC) and one 40-bit ALU (arithmetic and logic unit), and carries out a radix-2 complex FFT butterfly operation using two parallel move instructions. Since the DSP 56600 core has the configuration of a single multiplier-accumulator, the DSP 56600 core has a small area, but less operating efficiency than a dual multiplier-accumulator. The DSP 56600 core requires 8N+9 cycles to perform N radix-2 complex FFT butterfly operations.
FIG. 2 shows another example of an operator using the DIT FFT, i.e., a Carmel™ DSP core by Infineon Technologies AG. The Carmel™ DSP core is a 16 bit fixed-point decimation core, which includes two multiplexers 11, 11′ to select values for a data memory, two latch registers 12, 12′ to store selected outputs from the multiplexers 11, 11′, and data bus switches 13, 13′ to switch data from data operations and data from a data memory so as to input a corresponding operator in accordance with a desired operation. The Carmel™ DSP core also includes registers 14, 14′ storing data for input to a next-stage multiplier-accumulator, a first arithmetic unit 15 having a 16×16 MAC, a 40-bit ALU, an exponenter and a shifter for a block floating point operation, a second arithmetic unit 16 having a 16×16 MAC and a 40-bit ALU, and an accumulator bank 17 to accumulate and store results obtained in the first and second arithmetic unit 15, 16 and switched by the data bus switches 13, 13′. The Carmel™ DSP core, which adopts a CLIW (Configurable Long Instruction Word) architecture, carries out up to 6 operations including 2 parallel data move in a single cycle. Also, since the Carmel™ DSP core supports an automatic scaling mode, an overflow generated in the FFT operations is handled without having to use an additional cycle. However, the Carmel™ DSP core has a complex hardware configuration since the Carmel™ DSP core is designed with CLIW architecture to allow the parallel processing of the operations. The Carmel™ DSP core requires 2N+2 cycles to perform N radix-2 complex FFT butterfly operations.
FIG. 3 shows another example of an operator using the DIT FFT, i.e., a Starcore™ SC140 operator. The SC140, applying a VLIW (Very Long Instruction Word) architecture, includes two data memory buses 21, 21′ to send/receive data to and from the data memory. The SC140 also includes eight shifter/limiters 22 to shift or limit the operated data stored in the data register and load the data to the data memory buses 21, 21′, the data register stores an input and an output of operation units, and four 40-bit ALUs 24, 25, 26, 27. Since each of the ALUs 24, 25, 26, 27 has a MAC, it is possible to carry out up to four MAC operations or ALU operations in a single cycle. As a result, using the four MACs, the FFT operations are carried out with fewer operation cycles than the digital signal processor that has a single or dual MAC.
However, the Starcore™ SC140 has a large size and consumes a lot of power due to the integration of many of the operation components. Further, it is difficult to efficiently allot the operation components due to the data dependency, and it is difficult to read or write required data from/into the memory during a single cycle due to a lack of a data bus. As a result, the performance of the four MAC structure can not reach to twice as much as that of the dual MAC structure.
In performing N complex FFT butterfly operations using the SC140, 1.5N cycles are required. The above digital signal processors focus on increasing the number of the operators to accelerate the FFT butterfly operation or adjusting the data path to fit the butterfly operation flow. However, the reduction of the number of operation cycles of the butterfly is limited with respect to the limited number of the operators.
Assuming that two cycles are required for the butterfly operation, (N/2)log2 N butterflies are needed for the N-point FFT. Thus, if other influences are not considered, (2N/2)log2 N cycles are needed for the N-point FFT. In fact, during the FFT operation, operation cycles may be additionally generated for data movement or data address calculations.
Table 1 shows a comparison in the number of the butterfly operation cycles and the N-point FFT operation cycles of the Carmel DSP core and the TMS320C62x. As shown in Table 1, except for the butterfly operation cycle, additional cycles are required. For the Carmel DSP core, (2N/2)log2 N cycles are needed for the butterfly operation, and for the TMS320C62x, (4N/2)log2 N cycles are needed.
TABLE 1Number ofbutterflyoperationcyclesNumber of N-point FFT operation cyclesCarmel DSP2(2 N/2)log2 N + 5 N/4 + 10 log2 N + 4TMS320C62x4(4 N/2)log2 N + 7 log2 N + N/4 + 9
FIG. 4 shows an operation of a general 8-point radix-2 DIT FFT. In the N point FFT operation, there are log2 N stages and N−1 groups. Accordingly, there are three stages and seven groups shown in FIG. 4, and as the number of the stages increases, the number of the butterflies in the group increases or decreases.
The FFT operation is carried out in one stage and repeated in the next stage. Within a stage, the operation is carried out by the group. In using C or assembly language to implement the FFT, as shown in FIG. 5, three looping instructions are used for the operations of the stages, the groups, and the butterflies in each group, which may vary according to the architectures of a programmable processor and the program. Generally, three or four cycles are required to carry out the looping instruction in the digital signal processor. Assuming that L cycles are required for a single butterfly operation and M cycles are required to carry out the looping instruction, the number of the cycles to carry out the N point FFT operation is obtained through the following formula 3.(L×N/2)log2 N+M×(N−1)+M log2 N=α  [Formula 3]
In formula 3, the value of the expression (L×N/2)log2 N, which is determined by L, may be changed according to the number of the MACs and the ALUs in the digital signal processor, and the value of the expression M×(N−1)+M log2 N, which is determined by M, may be changed according to the configuration of a program controller in the digital signal processor.
In the butterfly operation for a group of a stage, the address of input data is increased by one. When the group is altered, the address of input data of a butterfly varies according to the size of the group. In formula 3, α is used to denote the number of the required cycles and the cycles required for the data move. If parallel processing is feasible as in the VLIW processor, the number of the additional operation cycles, except for the butterfly, is reduced to some degree by parallel-processing diverse instructions through the assembly coding. However, the reductions due to parallel processing are not sufficient. Referring to FIG. 4, address modification according to the alteration of the group is described. For example, “a” in the first butterfly ({circle around (1)} in FIG. 4, group 1) of the stage 2 is a memory address 0 and “b” is a memory address 2. In FIG. 4, “a” in the second butterfly of the stage 2 ({circle around (2)} in FIG. 4, group 1) is memory address 1, and “b” is memory address 3. In FIG. 4, “a” in the third butterfly of the stage 2 ({circle around (3)} group 2 in FIG. 4, group 2) is memory address 4, and “b” is memory address 6. The address of the input data “a” in group 1 increases from 0 by a value of 1. As the operation progresses from group 1 to group 2, the address of “a” changes from 1 to 4. That is, as the group is changed, the address increment of the input data also changes.
As aforementioned, to reduce the number of the operation cycles of the N point FFT in the programmable processor such as the digital signal processor, it is required to minimize the additional operation cycles except for the butterfly operation cycles. However, since the conventional digital processors do not support a hardware structure to reduce the additional operation cycles, it is difficult to reduce the number of the operation cycles.