FFT is an algorithm which is widely used in a signal processing field. The FFT is, for example, used to extract a complex symbol sequence from a received Orthogonal Frequency Division Multiplexing) signal in an OFDM receiver (e.g., a communication terminal of a Long Term Evolution (LTE) system, a wireless Local Area Network (LAN) device and a digital television broadcast receiver). N-point Discrete Fourier Transform (DFT) is represented by following formulae (1) and (2), where X(n) is referred to as a time domain sequence, Y(k) is referred to as a frequency domain sequence and Wnk is referred to as a twiddle factor.
                                          Y            ⁡                          (              k              )                                =                                                    ∑                                  n                  =                  0                                                  N                  -                  1                                            ⁢                                                          ⁢                                                X                  ⁡                                      (                    n                    )                                                  ⁢                                  W                  nk                                ⁢                                                                  ⁢                k                                      =            0                          ,        …        ⁢                                  ,                  N          -          1                                    (        1        )                                          W          nk                =                  exp          ⁡                      (                                          -                j                            ⁢                                                2                  ⁢                                                                          ⁢                  π                  ⁢                                                                          ⁢                  nk                                N                                      )                                              (        2        )            
A feature of the FFT algorithm is to decompose N-point DFT into a plurality of FFTs of radix points using the periodicity of the twiddle factor Wnk. Consequently, the FFT algorithm can significantly reduce an operation amount compared to directly calculating the DFT represented by formulae (1) and (2). The FFT algorithm is a widely known algorithm, and is described in detail in, for example, “Digital Signal Processing: Principles, Algorithms and Applications”, John G. Proakis, Dimitris K Manolakis, Prentice-Hall (1996). Hence, detailed explanation of the FFT algorithm will be omitted.
The FFT algorithm includes various variations depending on (a) a radix (e.g., Radix-2 or Radix4), (b) decimation in frequency (DIF) or decimation in time (DIT) and (c) a shape of a data flow graph (DFG). Hereinafter, Radix-2 DIF FFT will be described as an example. FIG. 9 illustrates a basic flow graph of a butterfly operation of Radix-2 DIF FFT. FIG. 10 illustrates a data flow graph of 16 point (N=24=16) Radix-2 DIF FFT. FFT can be implemented by a combination of butterfly operations illustrated in FIG. 9, FFT which performs L stages of N/2 butterfly operations, where L=log2N. As illustrated in FIG. 10, in the case of the Radix-2 DIF FFT, when an input data sequence (i.e., a time domain sequence) is arranged in natural order, an output data sequence (i.e., a frequency domain sequence) is in bit-reversed order. That is, frequency domain data output at a DFG index i is Y (brL(i)). A DFG index indicates order of a data output in a data flow graph. Note that, brL(i) is a natural number obtained by bit-inverting a binary representation of an L bit length of a natural number i so as to switch between a MSB (Most significant bit) and a LSB (Least Significant Bit) thereof. For example, when i=13 (decimal representation), a binary representation of a four-bit length of 13 is “1101”, and then a binary representation of br4(13) is “1011” and a decimal representation of br4(13) is “11”. Further, a binary representation of br5(13) is “10110”, and a decimal representation of br5(13) is “22”.
Further, Single-path Delay Feedback (SDF) architecture is known as one implementing method of performing pipeline processing on the FFT algorithm. For example, Non-Patent Literature 1 discloses details of the SDF architecture. FIG. 11 illustrates a configuration of an FFT circuit having Radix-2 DIF SDF architecture disclosed in Non-Patent Literature 1. An FFT circuit 8 illustrated in FIG. 11 includes a pipeline including L butterfly processing elements (referred to as butterfly PEs below) 80_1 to 80_L connected with each other, and the FFT circuit 8 also includes a sequence transforming unit 90. Note that, L is equal to log2N when the number of FFT points is N.
The butterfly PE 80_1 receives a time domain sequence X(n) in natural order, performs N/2 butterfly operations corresponding to a first stage in FIG. 10, and outputs a butterfly operation result to the butterfly PE 80_2 of the next stage. The butterfly PEs 80_2 to 80_L perform butterfly operations corresponding to a second stage to an Lth stage. Accordingly, the butterfly PE 80_L of the Lth stage outputs a frequency domain sequence Y(k) in bit-reversed order. Further, the sequence transforming unit 90 transforms the frequency domain sequence Y(k) in bit-reversed order into natural order. For the sake of description, output data (i.e., intermediate result data or frequency domain data) of a butterfly PE 80_S of an Sth stage is expressed as GS(i). The integer i represents a DFG index, and is equal to or more than 0 and equal to or less than N. The integer S represents the number of stages, and is equal to or more than 1 and equal to or less than L.
FIG. 12 is a block diagram illustrating a configuration of the butterfly PE 80_S of the Sth stage. The butterfly PE 80_S includes a butterfly processor 810, a delay circuit 820 and a counter 830. The butterfly processor 810 has two input ports IN1 and IN2 and two output ports OUT1 and OUT2. The first input port IN1 receives output data of the delay circuit 820. The second input port IN2 receives an output data sequence GS−1(i) from a butterfly PE 80_S−1 of the previous stage. The first output port OUT1 is connected to an input port of the delay circuit 820, and supplies data to the delay circuit 820. The second output port OUT2 supplies an output data sequence GS(i) to a butterfly PE 80_S+1 of a next stage or the sequence transforming unit 90.
The delay circuit 820 is arranged in a feedback path for feeding back an output of the butterfly processor 810 to an input thereof. The delay circuit 820 is a memory which can store data corresponding to 2L−S words, and outputs the stored data in FIFO (First In First Out) order. The delay circuit 820 is, for example, a FIFO buffer or a shift register. The counter 830 is an L-bit counter, and is reset to 0 at a timing when output data GS−1(0) of a DFG index “0” is input from the butterfly PE 80_S−1 of the previous stage. The counter 830 supplies a counter value C to a butterfly processor 110.
FIGS. 13A and 13B are block diagrams illustrating a configuration of the butterfly processor 810 illustrated in FIG. 12. The butterfly processor 810 shown in FIGS. 13A and 13B includes an adder 811, a subtractor 812, a multiplier 813, a twiddle factor selecting unit 814 and a selector (multiplexer) 815. As described above, the first input port IN1 receives the output data of the delay circuit 820, and the second input port IN2 receives the output data sequence GS−1(i) of the butterfly PE 80_S−1 of the previous stage. The adder 811, the subtractor 812 and the multiplier 813 perform the butterfly operation, which is illustrated in FIG. 9, on these two input data sequences.
The twiddle factor selecting unit 814 provides the multiplier 813 the twiddle factor WNk selected based on the counter value C of the counter 830. The selector 815 includes two selector elements 816 and 817. The selector element 816 selects, according to the counter value C, either data supplied form the delay circuit 820 or output data of the adder 811, and supplies the selected data to the second output port OUT2. Meanwhile, the selector element 817 selects, according to the counter value C, either the data GS−1(i) supplied from the butterfly PE 80_S−1 of the previous stage or output data of the multiplier 813, and supplies the selected data to the first output port OUT1.
Next, the butterfly PE 80_S of the Sth stage will be focused upon to describe an operation thereof. In the following explanation, bP(q) represents a Pth bit from the least significant bit (LSB) of binary representation of a natural number q. The butterfly PE 80_S performs butterfly operations corresponding to one stage (i.e., N/2 butterfly operations) in order from the top of the data flow graph (e.g., in FIG. 10). More specifically, when a (L−S+1)th bit from the least significant bit (LSB) in binary representation of the counter value C is 0 (i.e., when bL−S+1(C)=0), the selector elements 816 and 817, arranged in the selector 815, each select a port #0 side as illustrated in FIG. 13A. Consequently, output data GS−1(C) of the butterfly PE 80_S−1 of the previous stage is fed to the delay circuit 820 without being subjected to the butterfly operation.
Meanwhile, when bL−S+1(C)=1, the selector elements 816 and 817, arranged in the selector 815, each select the port #1 side as illustrated in FIG. 13B. The butterfly processor 810 generates data GS(C−2L−S) and GS(C) by performing a butterfly operation using the output data GS−1(C) of the butterfly PE 80_S−1 of the previous stage and data GS−1(C−2L−S) delayed by 2L−S cycles by the delay circuit 820. One butterfly operation result GS(C−2L−S) is fed to the butterfly PE 80_S+1 of the next stage through the selector element 816. On the other hand, the other butterfly operation result GS(C) is fed to the delay circuit 820 through the selector element 817. The butterfly operation result GS(C) is delayed by 2L−S cycles by the delay circuit 820, and is fed to the butterfly PE 80_S+1 of the next stage when bL−S+1(C)=0.
FIG. 14 is a table illustrating input and output data of the butterfly processor 810 arranged in the butterfly PE 80_2 of the second stage when N=16 (i.e., L=4). For reference, FIG. 14 illustrates decimal representation (DEC.) and binary representation (BIN.) of the counter values C, and the ports (#0 or #1) of the selector 815. In the example in FIG. 14, when b3(C)=1, i.e., when decimal representation of the counter value C is any one of 4 to 6 and 12 to 15, a butterfly operation result G2(C−4) of the second stage is fed to the butterfly PE 80_3 of the third stage and G2(C) is fed to the delay circuit 820. Meanwhile, when b3(C)=0, i.e., when decimal representation of the counter value C is anyone of 0 to 3 and 8 to 11, output data G1(C) of the butterfly PE 80_1 of the first stage is fed to the delay circuit 820 and G2(C−4), which is delayed by four cycles by the delay circuit 820, is fed to the butterfly PE 80_3 of the third stage.
A delay amount of the L-stage butterfly PEs 80_1 to 80_L in the above-described pipeline FFT circuit 8 having Radix-2 DIF SDF architecture is expressed by following formula (3).
                                          ∑                          S              =              1                        L                    ⁢                                          ⁢                      2                          L              -              S                                      =                  N          -          1                                    (        3        )            
Next, the sequence transforming unit 90 illustrated in FIG. 11 will be described. As described above, the sequence transforming unit 90 transforms the frequency domain sequence Y(k), output from the butterfly PE 80_L of the Lth stage, from bit-reversed order to natural order, and outputs the frequency domain sequence Y(k) in natural order. FIG. 15 is a block diagram illustrating a configuration example of the sequence transforming unit 90. The sequence transforming unit 90 shown in FIG. 15 includes a memory 910, an address generating unit 920 and a counter 930.
The counter 930 is an L-bit counter, and is reset to 0 when the sequence transforming unit 90 receives data Y(0). The counter 930 supplies its counter value to the address generating unit 920. Further, the counter 930 sends a mode signal to the address generating unit 920 according to the number of times of processed FFTs. More specifically, the counter 930 generates the mode signal having the value “0” when the number of times of processed FFTs is odd, and generates the mode signal having the value “1” when the number of times of processed FFTs is even.
The address generating unit 920 sends Write and Read addresses to the memory 910. More specifically, in the case of a mode 0, the address generating unit 920 outputs values obtained by bit-reversing the counter value (0, . . . , or N−1 in decimal representation) of the L-bit counter 930, i.e., brL(0), and brL(N−1) as the Write and Read addresses. Meanwhile, in the case of a mode 1, the address generating unit 920 sends the counter value (0, or N−1 in decimal representation) of the L-bit counter 930 as the Write and Read addresses.
The memory 910 is an N-word memory. The memory 910 transforms the frequency domain sequence Y(k) from bit-reversed order to natural order, to output the frequency domain sequence Y(k) in natural order, by reading and writing the frequency domain sequence Y(k) according to the Write and Read addresses generated by the address generating unit 920.
The sequence transforming unit 90 shown in FIG. 15 operates as follows. First, when receiving the first input of the frequency sequence Y(k), the sequence transforming unit 90 operates in the mode 0. Hence, the input sequence Y(brL(0)), . . . , and Y(brL(N−1) in bit-reversed order is written into the memory 910 according to the Write address (brL(0), . . . , and brL(N−1)) obtained by bit-reversing the counter value of the counter 930. Consequently, when the N-word frequency domain sequence Y(k) is written into the memory 910 during the mode 0, the memory 910 stores this data in natural order. The N-word frequency domain sequence Y(k) has been written into the memory 910 during the mode 0 is read out during the mode 1. The Read address during the mode 1 is the counter value itself of the counter 930, and therefore the memory 910 outputs the frequency domain sequence Y(k) in natural order. In addition, the frequency domain sequence Y(k) obtained by next FFT is immediately written into the address from which the frequency domain sequence Y(k) has been read out during the mode 1. Consequently, it is possible to prevent unread frequency domain data from being overwritten.
When the N-word frequency domain sequence Y(k) is written into the memory 910 during the mode 1, the memory 910 stores this data in bit-reversed order. The N-word frequency domain sequence Y(k) has been written into the memory 910 during the mode 1 is read out during the mode 0. The Read address during the mode 0 is obtained by bit-reversing the counter value of counter 930, and therefore the memory 910 outputs the frequency domain sequence Y(k) in natural order.
The memory 910 temporary stores N-word data, and hence the delay amount of the sequence transforming unit 90 is N cycles. Consequently, a total delay amount of the pipeline FFT circuit 8 including the delay amount of the butterfly PEs 80_1 to 80_L expressed in formula (3) is 2N−1 as expressed in following formula (4).
                              N          +                                    ∑                              S                =                1                            L                        ⁢                                                  ⁢                          2                              L                -                S                                                    =                              2            ⁢                                                  ⁢            N                    -          1                                    (        4        )            