By using Fast Fourier Transformation, the Discrete Fourier Transform can be obtained. This is important in many signal processing scenarios.
In particular in, for example, mobile communication scenarios, the FFT is required to be obtained for various purposes. Conventionally, in case a single data stream is to be subjected to FFT transformation, various scenarios for accomplishing this are known. A single data stream is often referred to as SISO, “Single Input Single Output”. As a typical SISO scenario, one might consider a case in which a communication network entity such as a base station or Node_B transmits via a single antenna or antenna element data to a mobile station or user equipment with one antenna element (or vice versa).
On the other hand, with further developments in communication technology, scenarios are implemented and under investigation which apply multiple antenna elements for transmission and for reception. In such cases, a so-called “Multiple Input Multiple Output”, MIMO, concept is present. MIMO concepts are often applied in connection with Orthogonal Frequency Division Multiplex, OFDM, systems.
MIMO-OFDM (multiple-input-multiple-output orthogonal frequency division multiplex) systems offer remarkable increase in link reliability and/or in data rate. However, this new technique suffers on higher complexity of the hardware. For this reason, there is a need of clever strategies to reduce the expenditure of hardware.
Apparently, with multiple input data streams being present simultaneously, i.e. in parallel, also those multiple data streams have to be subjected to FFT. This imposes a certain problem in terms of processing load, processing speed, and/or complexity for the signal processing methods and hardware used for this purpose.
The FFT transformation is a central process in conventional OFDM (SISO-OFDM: single-input-single-output OFDM) systems. The transition to MIMO technique results in an OFDM system with several FFT transformation processes in parallel. For instance, MIMO systems with four receiver antenna elements need four FFT transformations. In straightforward solutions, there have to be installed four FFT processing blocks. This leads to much higher hardware complexity. Hence, there is a need for a new implementation strategy of the FFT for MIMO systems.
He and Torkelson have presented “A new approach to Pipeline FFT processor” in IEEE Proceedings of IPPS '96, 1996, pp. 766 to 770. This document introduces various pipeline FFT processors for SISO scenarios.
For better understanding of the present invention to be described hereinafter, a brief review and introduction of the FFT pipeline architecture as presented by He and Torkelson is given hereinafter. A particular usable FFT is briefly introduced to obtain an idea of the main structure and its properties.
To this end, the SISO Radix 22 single-path delay feedback (SDF) architecture proposed by He & Torkelson will be considered. This architecture is also referred to as R22SDF.
FFT for SISO Systems According to He & Torkelson
As mentioned, a structure of the FFT algorithm was proposed, where a Radix 22 single-path delay feedback (SDF) architecture is used. Because of the SDF, the spatial regularity of the resulting architecture/signal flow graph could be exploited. The resulting hardware requirement is minimal on both dominant components: complex multipliers and complex data memory.
For a hardware-oriented implementation, this approach combines the advantage of the signal flow graph, SFG, of radix 4 and radix 2 approaches. The SFG radix 4 requires minimum of non-trivial multipliers, whereas the SFG radix 2 uses a simple butterfly structure.
FIG. 1 illustrates the resulting signal flow graph structure for N=16 (16 points FFT), i.e. a received data stream to be subjected to FFT is assumed to comprise N=16 samples (N samples forming one symbol). Trivial multiplications denoted by the multiplier “−j” appear between a first, BF I, and a second, BF II, stage of the SFG. At the first stage, a simple butterfly structure is used. Then, in the second stage, the same calculation process is realized. And additionally, the last. N/4=4 outputs of the first stage BFI are multiplied by −j. Assuming a complex number Z=R+j*I with R denoting the real component and I denoting the imaginary component, a multiplication by “−j” will then lead to −j*Z=−j*R+I. Apparently, the real and imaginary parts are exchanged and the imaginary part is inverted in terms of the sign. Therefore, this multiplication is regarded as trivial (real-imaginary swapping and sign inversion). These operations are indicated by diamonds symbols in FIG. 1. After these two stages, full multipliers are required to compute the product of the decomposed twiddle factor. The multipliers perform a multiplication with multiplication factors W (twiddle factors). Twiddle factors are those coefficients applied to results from a previous stage to combine these in order to form inputs of a next stage.
Applying the Common Factor Algorithm, CFA, procedure recursively to the remaining DFT's (Discrete Fourier Transforms) of lengths N/4, the complete radix 22 DIF FFT algorithm is obtained, as shown in FIG. 2. As an explanatory remark, using such an approach, a number of N=16 data sets (samples) of an incoming stream is decomposed in a pipeline fashion into a succession of stages log2N=4. That is, for N=16 data samples, a 4 stage FFT SFG and/or architecture will result (total number of stages k=4 in this example). A respective i-th stage (i=1 . . . 4) is designed to process a number of data sets of 2(log2N+1−i). Thus, the first stage (i=1) BF I receives/processes 16 data samples, and the fourth stage (i=4) BF IV receives/processes 2 data samples.
Architecture
In the following, the architecture will be described with reference to a DFT example for N=16 samples.
As shown in FIG. 2, the FFT structure for N=16 data samples has four butterfly stages BFI, . . . , BFIV. Note that BFI, . . . BF IV denote the stages and do not denote the BF types employed in a respective stage. There can be seen that the non-trivial multipliers are between the second, BFII, and the third stage, BFIII, according to the signal processing order. In addition, the rotations (trivial multiplications) by −j are done after the first, BFI, and after the third, BFIII, stage. FIG. 3 illustrates the resulting pipeline architecture. The blocks above the butterfly structures indicate FIFO memories and the numbers indicated therein the delay imposed thereby, i.e. number of samples buffered by these.
The FIFO memories are located in the single delay feedback path of the structure. FIFO memories are particularly useful in terms of hardware, but the FIFO property could also be realized by another memory type in combination with appropriate addressing of the memory in order to read out the stored data in FIFO fashion.
For instance, the FIFO in the first stage after the input port has the length of 8 symbols. Apparently, the number of delay elements, i.e. the number of samples buffered in the feedback path of a i-th stage out of k stages is N/2 for i=1, N/4 for i=2, N/8 for i=3, and N/16 for i=4, and can generally be expressed as N/2i for an i-th stage. The data control for the butterflies is indicated by the bar on the bottom of the figure, which schematically indicates control signals supplied to the four stages 1 . . . 4 of the pipeline architecture. Butterfly stages of type I (BF2I) receive a single control signal only and are applied in stages i=1 and i=3, and Butterfly stages of type II (BF2II) receive two control signals and are applied in stages i=2 and i=4. The twiddle factors W(n) are for example read out from a memory (not shown in FIG. 3) with appropriate timing. The timing of the control signals supplied to BF2I and BF2II stages as well as for twiddle factor generation/supply depends on the clock rate of the FFT device.
The internal structure of the respective butterfly stage is shown in FIG. 4 (BF2I) and FIG. 5 (BF2II). Note that input and output ports are divided into a real (index r) and imaginary (index i) part. N denotes the number of symbols contained in the stream to be subjected to FFT processing and n is an index variable with 1<=n<=N. (The memory “capacity” of e.g. the FIFO in the feedback path depends on the stage index i with 1<=i<=k.)
FIGS. 11A and 12 show details of the data control in terms of control signals applied and timing relations there between, as will be described later on.
The calculation process at each stage is done in two steps.
In the first step (control signal s=0), the data sequence x(n) (n=1 . . . 16/2) is read at the input ports xr(n+N/2)/xi(n+N/2) and is directly written to the ports Zr(n+N/2)/Zi(n+N/2) which are connected to the FIFO. At the same time, the FIFO content is read at the ports xr(n)/xi(n) and is directly written, as the other output port pair, to the ports Zr(n)/Zi(n) which are connected to the next pipeline stage.
In the second step (control signal s=1), after N/2=8 symbols, the stored data and the remaining input symbols x(n) (n=9 . . . 16) are used to compute the stage output where one half is written to the next stage (ports Zr(n)/Zi(n)) and the other half is stored in the FIFO memory (ports Zr(n+N/2)/Zi(n+N/2)).
To accomplish such processing, the internal structure uses adders/subtractors and internal signal feeding paths as shown in FIG. 4. In addition, supplying the signals to FIFO memory and/or next stage Butterfly stage is accomplished using switches under control of the control signal s. The operational condition of a respective switch is denoted by 0 and/or 1 which represents the respective state of the control signal s applied in order for the switch to be in the respective operational condition. An adder is illustrated by the encircled “+”, a subtractor is illustrated by the encircled “+” with an additional subscript “−”.
The calculation process of the butterfly stage BF2II differs from the one done in BF2I a little. Since these stages additionally include the j rotations, i.e. the “trivial” multiplications by “−j”, the real and imaginary parts of input signals have to be swapped. In addition, the signs have also to be changed as shown in FIG. 5. This is controlled by the signal t. The negated signal t is logically combined in an AND gate with the signal s and controls the swapping paths at the input terminals xr(n+N/2), xi(n+N/2) as well as the adders/subtractors in the signal paths associated to the signals xi(n) and xi(n+N/2). Thus, for s=1 and t=0 there occurs a swapping and conversion of the adder, else there is no swapping and conversion of the adder. The remaining process and architecture is equal to the BFI process.
FIG. 11A shows details of control signals with a corresponding timing relation being illustrated in FIG. 12.
As shown in FIG. 11A, a clock signal clk is supplied to the (FIFO) memory, a twiddle factor generation means (e.g. including a memory from which the factors are read out) and the BF2II stage. A signal supplied to the BF2II stage from a preceding stage is denoted with x, and signals s and t as explained before are also supplied. A signal leaving the BF2II stage to a subsequent multiplier is denoted with z and supplied to the multiplier for multiplication with a twiddle factor w. Afterwards, the multiplied signal is forwarded to the next stage (not shown in FIG. 11A). (Note that substantially the same holds for a stage of type BF2 I, with the difference that the control signal t is not applied and that a signal z leaving a stage of BF2I type will be supplied to a BF2II stage (input signal x) and not to multiplier performing multiplication with twiddle factors).
FIG. 12 shows the timing relation there between. In the lower part of FIG. 12, the signals z, w and clk are supplied in synchronism with each other. With each clock cycle clk, a new signal z is supplied to the multiplier which is in synchronism therewith supplied with a corresponding weight (twiddle) factor w. In the upper part of FIG. 12 it is shown that a sample x of a sequence of 1 . . . N samples (forming one OFDM symbol) is supplied with each clock cycle clk. Initially, the signal s assumes a low level (s=0) for the first N/2 samples. Thereafter, starting with sample N/2+1, it assumes a high level until N samples have been supplied. (Thereafter, a new OFDM symbol sequence starts and s=0). As to the signal t, this signal assumes a high level for the first 3*N/4 samples and changes afterwards (starting with sample 3/4*N+1) for the last N/4 samples to the low level.
Finally, Table 1 shows the complexity of this prior art FFT architecture, which is used in the further development of the multi-stream transformation for MIMO-OFDM systems.
TABLE 1Computational Complexity of the FFT.MemoryMultiplierAdderSizeControlR22SDFLog4 NFFT −14Log4 NFFTNFFT −1SimpleFFT for MIMO Systems
Now, two straightforward architecture alternatives are presented for MIMO systems based on this FFT structure. Notwithstanding this, other FFT structures could be used. In the following, the previously described FFT structure (R22SDF) is implemented for MIMO systems. There are two possible strategies to realize the transformation process for MR antenna system, i.e. systems having a number of MR antennas.
FIG. 6 shows a full parallel implementation with a FFT block per each data stream to be transformed. Thus, on the one hand, a number MR of FFT blocks can be implemented, i.e. one for each stream (see FIG. 6 for the example of MR=4). It can be seen that the complexity of such a system grows linearly with the number of antennas (i.e. MR times one FFT complexity).
On the other hand, to reduce the complexity of the system, the transformation process can be done successively by a smaller number (MFFT) of FFT blocks (straightforward successive FFT solution). In order to transform successively MR parallel streams, the FFT has (or the FFTs have) to work at a higher rate. Because of the used FFT pipeline structure, the frequency can be increased arbitrarily.
FIG. 7 illustrates such a successive transformation process for MR=4 and MFFT=1, i.e. using a single FFT only. Due to this processing, the input streams are multiplexed upstream of the FFT using a multiplexer MUX and demultiplexed using a demultiplexer DeMUX after, i.e. downstream the FFT. This strategy results in a reduction of computational complexity, depending on the sharing ratio (MR/MFFT). Unfortunately, each stream requires an additional input buffer that collects one OFDM symbol before sending it to the FFT.
FIG. 8 illustrates the timing of signal processing of this structure as shown in FIG. 7. In a first step, NFFT symbols of each stream (example: number of streams MR=4) are written to the corresponding stream buffer. Due to the MR streams arriving in parallel, the MR buffers are simultaneously getting filled. Finally, after the buffering period, each buffer successively shifts its content into the FFT block, which works at a higher rate. Since the buffer content of the streams is used sequentially and new data symbols are continuously fed to the FFT at the same time, another buffer (not shown) is needed.
In a first buffer area I, samples of MR data streams are buffered. Assuming a multiplexing sequence of MR streams 1 . . . 4, the samples of stream 1 are used as FFT input first.
In the meantime, further data samples of following symbols are buffered in a buffer area II for streams 2 . . . 4. Samples of stream 2 will be subjected to FFT processing next, which is the reason why buffer area II for stream 2 will not fill too much. Since streams 3 and 4, respectively, will be subjected to FFT processing pre-last or last, respectively, the respective buffer area II for these streams will be filled to a greater extent. The indication of multiples of NFFT indicate the additional amount of buffer memory required for buffer area II.
The need and the size for the additional buffer area can also be seen at the time axis t in FIG. 8. At the time when the first sequence is fed into the FFT, the incoming values of the remaining sequences have to be buffered until the FFT block has finalized the input process for the first sequence. For the second sequence for MR=4, the FFT is able to read the next sequence after N/MR=0.25N time steps. This results in an absolute value of t=1.25N. For the 3rd and 4th sequences, the waiting or buffer time is 2N/MR=0.5N (absolute: t=1.5N) and 3N/MR=0.75N (absolute: t=1.75N). Consequently, the data input for all sequence is finalized after N time steps and at the time t=2N the next OFDM symbol period begins.
Assuming an FFT processing rate of four times higher compared to the symbol rate, the additional memory size for buffering is
                              1          2                ⁢                  (                                                    M                R                2                                            M                FFT                                      -                          M              R                                )                ⁢                              N            FFT                    4                                    Eq        .                                  ⁢                  (          1          )                    
In addition, the FFT uses a memory in the size of NFFT−1. Thus, the overall memory size (complex symbols) is given by
                                                        M              R                        ⁢                          N              FFT                                            ︸            BufferI                          +                                            (                                                                    M                    R                    2                                                        M                    FFT                                                  -                                  M                  R                                            )                        ⁢                                          N                FFT                            8                                            ︸                          Buffer              ⁢                                                          ⁢              II                                      +                                            (                                                N                  FFT                                -                1                            )                        ⁢                          M              FFT                                            ︸            FFT                                              Eq        .                                  ⁢                  (          2          )                    
For a system with four antennas (MR=4) and one FFT (MFFT=1), the above equation can be simplified to
                                                        4              ⁢                              N                FFT                                                    ︸              BufferI                                +                                    1.5              ⁢                              N                FFT                                                    ︸              BufferII                                +                                    (                                                N                  FFT                                -                1                            )                                      ︸              FFT                                      =                              6.5            ⁢                          N              FFT                                -          1                                    Eq        .                                  ⁢                  (          3          )                    
For MIMO receivers with MR antennas, MR independent data symbol streams have to be transformed. Usually, according to the approach introduced with reference to FIG. 6, the data symbols are fed into MR FFT blocks. Especially for large FFT length, this results in highly complex system architectures.
As shown in the successive processing alternative introduced with reference to FIGS. 7 and 8, there is a possibility to reduce the architecture complexity up to a complexity of one FFT. Unfortunately, the memory consumption of this option increases from 4NFFT−4 (parallel FFTs solution) to 6.5NFFT−1 complex symbols.