The invention relates to the field of modulation/demodulation of information signals, and, more particularly, to transforming a stream of complex symbols initially formed of N complex samples into a stream of respective groups of 2N real output data using interleaved type processing. The invention applies to systems for transmitting orthogonal frequency division multiplex (OFDM) coded information. These systems form, for example, the sending portion of a very high speed digital modulation/demodulation device (VDSL modem).
In OFDM coding, the signal to be transmitted is coded on N carriers which are phase-modulated and amplitude-modulated as a function of the content of the information to be transmitted. Each carrier has a predetermined frequency and all the frequencies of the carriers are a submultiple of a predetermined sampling frequency. Also each symbol formed of N digital carriers, which are N complex samples sampled at the sampling frequency, must be transformed into a group of 2N real data sampled at twice the sampling frequency. This allows transmission over a transmission channel, such as a telephone line.
The transformation of an initial complex symbol respectively formed of N initial complex samples into a group of 2N real output data can be performed in several ways. A first approach performs an inverse Fourier transform of twice the size, that is, of size 2N. However, this approach requires the addition of an extra processing stage as well as the addition of extra memory.
A second approach performs an inverse Fourier transform of the same size, that is, of size N. This is followed by a complex filtering to eliminate part of the spectrum. However, such an implementation leads to a relatively complicated hardware embodiment.
A third approach also performs an inverse Fourier transform of size N, but this is followed by real filtering. However, this approach, which is simpler to implement than the previous approach, is approximate with regards to the accuracy obtained by the signal-to-noise ratio. The signal-to-noise ratio may turn out to be relatively large, thus leading to signal degradations. Also, the increase in the performance of this approach, that is, the reduction in the signal-to-noise ratio, requires the use of an extremely large real filter. This involves an expensive hardware implementation.
Another approach performs the transformation of the stream of initial complex signals respectively formed of N initial complex samples into a stream of respective groups of 2N real output data. This is done by interleaved type processing whose theoretical formulation is well known to one skilled in the art.
The main characteristics of interleaved type precessing will be discussed for all useful purposes. The real signal x(t) corresponding, for example, to an OFDM symbol, is defined by formula (I):                               x          ⁡                      (            t            )                          =                              ∑                          k              =              1                                      N              -              1                                ⁢                      xe2x80x83                    ⁢                                    M              k                        ·                          cos              ⁡                              (                                                      2                    ⁢                    π                    ⁢                                          xe2x80x83                                        ⁢                                          f                      k                                        ⁢                    t                                    +                                      ϕ                    k                                                  )                                                                        (        I        )            
The symbol Mk denotes the amplitude of the carrier of rank k, xcfx86k denotes its phase, fk denotes its frequency and Nxe2x88x921 the number of carriers. When the frequencies of the carriers are all multiples of a frequency f1, then formula (I) becomes formula (II) in complex notation:                               x          ⁡                      (            t            )                          =                              Re            [                                          ∑                                  k                  =                  1                                                  N                  -                  1                                            ⁢                              xe2x80x83                            ⁢                                                C                  k                                ·                                  ⅇ                                      2                    ⁢                    j                    ⁢                                          xe2x80x83                                        ⁢                    π                    ⁢                                          xe2x80x83                                        ⁢                                          kf                      1                                        ⁢                    t                                                                        ]                    .                                    (        II        )            
The symbol Ck denotes the initial complex sample representative of the carrier of rank k. Ck is defined by the formula (III):
Ck=Mkxc2x7ejxcfx86kxe2x80x83xe2x80x83(III)
With a sampling of the signal at the frequency Nf1 and by extending the length of the symbol to N carriers (by adding the carrier C taken equal to 0), it can then be shown that the N real output data of even ranks, corresponding to the N complex samples of the input symbol are given by formula (IV):                               {                      x                          2              ⁢              p                                }                =                  Re          (                                    IFFT              N                        ⁢                          {                                                (                                                            C                      k                                        +                                                                  C                        _                                                                    N                        -                        k                                                                              )                                +                                                      j                    ⁡                                          (                                                                        C                          k                                                -                                                                              C                            _                                                                                N                            -                            k                                                                                              )                                                        ⁢                                      ⅇ                                                                                            j                          ⁢                                                      xe2x80x83                                                    ⁢                          π                                                N                                            ⁢                      k                                                                                  }                                )                                    (        IV        )            
The real data of odd ranks x2p+1 are given by formula (V):                               {                                    x                              2                ⁢                p                                      +            1                    }                =                  Im          (                                    IFFT              N                        ⁢                          {                                                (                                                            C                      k                                        +                                                                  C                        _                                                                    N                        -                        k                                                                              )                                +                                                      j                    ⁡                                          (                                                                        C                          k                                                -                                                                              C                            _                                                                                N                            -                            k                                                                                              )                                                        ⁢                                      ⅇ                                                                                            j                          ⁢                                                      xe2x80x83                                                    ⁢                          π                                                N                                            ⁢                      k                                                                                  }                                )                                    (        V        )            
In these formulas (IV) and (V), {overscore (C)}Nxe2x88x92k represents the complex conjugate of the complex number CNxe2x88x92k, IFFTN represents the inverse Fourier transform of size N operator, Im denotes the imaginary part of a complex number, and Re denotes the real part of a complex number.
The processing of the interleaved type includes a preprocessing phase in which, for each initial symbol received formed of N initial complex samples Ck, an auxiliary symbol formed of N auxiliary complex samples Ak is formulated. Each auxiliary complex sample Ak is defined by formula (VI):
Ak=(Ck+{overscore (C)}Nxe2x88x92k)+j(Ckxe2x88x92{overscore (C)}Nxe2x88x92k)ejxcfx80xcexa/Nxe2x80x83xe2x80x83(VI)
After this preprocessing, a processing phase is performed which includes, for each auxiliary symbol formed of the auxiliary samples Ak, an inverse Fourier transform calculation of size N. The result of this inverse Fourier transform is a set of N complex output coefficients Xk which, after rearrangement so as to retrieve the input order, makes it easily possible to obtain the 2N real data corresponding to the input symbol. This is so since the real data of even and odd ranks correspond respectively to the real parts and imaginary parts of the complex output samples successively obtained after rearrangement.
At present, the only known implementation of this interleaved processing is an entirely software implementation which turns out to be relatively complex to use in industrial devices, such as modems, for example. Furthermore, the larger the size of the Fourier transform and the greater the increase in processing speed, the more severe the implementation constraints become.
Moreover, numerous implementations of direct or inverse Fourier transforms which are dedicated or programmed on microprocessors for signal processing have been set out in the literature. Most of these implementations use a variation of the Cooley-Tukey algorithm, which makes it possible to reduce the number of arithmetic operations required to calculate the Fourier transform.
The Cooley-Tukey algorithm will be readily understood by one skilled in the art. This algorithm makes it possible, in particular, to reduce the calculation of a fast Fourier transform of initial size rp, where r represents the xe2x80x9cradixxe2x80x9d according to the terminology customarily used by one skilled in the art, into that of r Fourier transforms of size rpxe2x88x921 and of additional complex additions and multiplications. By iteratively repeating this reduction, we arrive at the calculation of Fourier transforms of size r, which are easily achievable, especially if r is chosen equal to 2 or 4.
The Cooley-Tukey algorithm uses a calculation graph exhibiting a general butterfly-like structure, well known to one skilled in the art, and is commonly referred to by the term xe2x80x9cbutterflyxe2x80x9d. Several hardware architectures are then possible to implement a butterfly-like calculation structure.
A first approach constructs a hardware operator capable of performing a butterfly type calculation, i.e., per butterfly of the graph. However, this approach is only conceivable with respect to the implementation of Fourier transforms of small size.
A second approach constructs just a single hardware operator of the butterfly type, and is intended for performing in succession the calculations corresponding to all the butterflies of all the stages of the graph. This approach requires a very fast hardware operator, and an input memory which is separate from the memory serving to write the intermediate calculation results. This is done to avoid access conflicts when a data block enters the operator while the previous block is still undergoing processing.
An intermediate approach constructs a hardware operator of the butterfly type per stage of the graph, as well as a storage element. Storage elements include delay lines or shift registers, whose function is to input the data into the operator in the correct order with regards to the butterflies of the graph of the relevant stage. Such architectures are said to be serial or pipelined.
More precisely, an electronic device for calculating a Fourier transform of the so-called pipelined architecture includes a plurality of successive processing stages connected in series between the input and the output of the device by internal data paths. These stages respectively include elementary processing means able to perform processing of Fourier transforms of elementary sizes, which are smaller than the initial size, on data blocks of sizes which are successively reduced from one stage to the next. These stages also include elementary storage elements.
The expression xe2x80x9cinitial sizexe2x80x9d of the Fourier transform is understood to mean here and in the subsequent text the size of the blocks received at the input of the device by the first stage. The elementary sizes of the Fourier transforms performed by the various stages can be identical and equal to the radix of the Fourier transform. We then speak of a uniform radix Fourier transform. They may differ from one stage to another in the case of mixed radix Fourier transforms.
Examples of such pipelined architectures are described in the article by Bi and Jones entitled xe2x80x9cA Pipelined FFT Processor for Word-Sequential Dataxe2x80x9d, IEEE Transactions on Acoustic Speech and Signal Processing, Vol. 37, No. 12, December 1989, pages 1982-1985, and in the article by Bidet et al., entitled xe2x80x9cA Fast Single-Chip Implementation of 8192 Complex Point FFTxe2x80x9d, IEEE Journal of Solid-State Circuits, Vol. 30, No. 3, March 1995, pages 300-305.
The storage elements described in these known architectures include delay lines which are very simple elements to manage, and which have the advantage of generally being compact (use of three transistors per bit stored). However, these elements are not always available as standard cells in stock libraries of components usable for defining and designing integrated circuits. Furthermore, their electrical characteristics are dependent on the technology used so that the architecture of the circuit must be carefully re-examined whenever the technology evolves. Moreover, such architectures use delay lines whose total storage capacity is equal to 2N0 for an initial size of Fourier transform equal to N0, while the theoretical minimum storage capacity is equal to N0.
An object of the invention is to use a random access memory, such as a single access memory, for storage in each stage of a pipelined architecture.
The use of a random access memory, whether it is dual access (dual port) or single access (single port permitting either write-access or read-access at each cycle of the internal clock of the device), requires specific management of addressing so that the intermediate data in memory can be stored and redelivered in the right order. This management is all the more complex when the radix of the Fourier transform is greater than 2, and in particular, when it is equal to 4. This goes against current teachings on the subject which provide for the use of delay lines or shift registers.
Moreover, it has been found that the use of a random access memory enabled the storage capacity to be reduced stage by stage, and, therefore, the total storage capacity of the device relative to the storage capacity required when employing delay.lines. Such a component is readily available in ordinary libraries of components, particularly when in its simplest form (single access memory), is totally independent of the technology used, and is compatible with very high clock frequencies.
In a pipelined architecture using a single access memory, the latency time, i.e., the duration separating the arrival of the first sample of a symbol and the delivery of the first complex output sample, is around 3N/2 cycles of the basic clock signal regulating the reception of the initial complex samples. This includes N cycles for the filling of the memory of the first stage. The preprocessing includes formulating an auxiliary complex symbol on the basis of each initial complex symbol, and in storing it in a single access memory. This also requires a latency of N cycles of the basic clock signal.
The total duration of latency rises to 5N/2 cycles of the basic clock signal. The latency includes preprocessing followed by transfer of the auxiliary symbol from its storage memory into the memory of the first stage followed by inverse Fourier transform. Such a latency duration may turn out to be incompatible with high speed transmission applications using VDSL modems working at 55 Mbits/second, for example. Stated otherwise, for such high speed applications, the use of a processing of the interleaved type using an architecture of the pipelined type for implementing the inverse Fourier transform calculation, at present proves to be a difficult problem to implement.
The invention provides a solution to this problem, which is to use an inverse Fourier transform implementation with pipelined architecture using a random access memory, such as a single access memory for each processing stage. In particular, the first processing stage uses a random access memory while optimizing the latency so as to obtain, despite the preprocessing phase of the processing of interleaved type, a latency duration equal to that of the inverse Fourier transform calculation means alone, i.e., 3N/2 cycles for an inverse Fourier transform of size N.
The invention therefore proposes a process for transforming a stream of initial complex symbols, respectively formed of N initial complex samples, into a stream of respective groups of 2N real output data by interleaved type processing that includes a preprocessing phase in which for each initial symbol received, an auxiliary symbol formed of N auxiliary complex samples is formulated. A processing phase comprises for each auxiliary symbol an inverse Fourier transform calculation of size N including elementary processing of the butterfly type corresponding to several stages of a general butterfly-like calculation graph, and delivers the 2N real output data corresponding to the initial symbol received.
According to a general characteristic of the invention, the various stages of the graph are implemented within a pipelined architecture, and, upon receiving an initial symbol, two separate random access memories (e.g., single access memories) are simultaneously used to respectively store in a first memory the auxiliary symbol corresponding to this initial symbol, and to perform on the basis of the content of the second memory the elementary processing corresponding to the first stage of the graph. Furthermore, the two memories are swapped with each new receipt of an initial symbol.
Stated otherwise, on receipt of a current initial symbol the corresponding formulated auxiliary symbol is stored in a first memory while the elementary processing corresponding to the first stage of the graph is performed on the basis of the content of a second memory. This second memory contains the formulated auxiliary symbol corresponding to the previous initial symbol. Then, on receipt of the next initial symbol, the corresponding formulated auxiliary symbol is stored in the second memory while the elementary processing corresponding to the first stage of the graph is performed on the basis of the content of the first memory. The elementary processing of the auxiliary symbol corresponds to the current initial symbol. This continues with respect to the receipt of the succeeding symbols.
The simultaneous implementation of the preprocessing and of the elementary processing corresponding to the first stage of the graph in the course of each reception of an initial complex symbol, in combination with the use of two memories used alternately for the preprocessing and for the elementary processing of the Fourier transforms during successive receptions of symbols, makes it possible to dispense with the transferring of the auxiliary samples from one of the memories to the other. The Fourier transform processing begins as soon as one of the memories is full. A saving of a certain number of basic clock cycles is thus achieved.
According to a relatively straightforward mode of implementation, the two memories are single access random access memories of identical structure. The auxiliary complex samples are formulated and stored as a pair in the corresponding memory. Each pair is formulated and stored in the course of a clock cycle of the basic clock signal regulating the reception of the initial complex samples. The identical structure of the memories allows easy swapping of the memories.
When two single access memories are used in combination with the switching of these memories, the total latency time is equal to 3N/2 cycles of the basic clock signal. There are N cycles for the preprocessing and N/2 cycles to obtain the output samples after Fourier transform.
According to one mode of implementation, the auxiliary sample of rank k is formulated on the basis of a pair of paired initial samples which is formed by the initial sample of rank k within the initial symbol and of the initial sample of rank Nxe2x88x92k. Two samples are said to be paired when the sum of their respective ranks is equal to N.
Moreover, if r denotes the radix of the inverse elementary Fourier transform associated with the first stage of the graph, each of the two memories is subdivided into r independent memory banks of identical size equal to N/r. There thus are received, with each initial sample, a rank indication representative of its rank within the symbol and a pairing indication indicating whether the initial sample is the first received or the second received of the corresponding pair.
If the pairing indication associated with an initial sample is representative of the first received, this sample is stored in the corresponding memory. If the pairing indication associated with an initial sample is representative of the second received, the initial sample paired with this sample received second is extracted from the corresponding memory. The two paired auxiliary samples of rank k and of rank Nxe2x88x92k are formulated on the basis of these two paired initial samples of rank k and of rank Nxe2x88x92k. These two paired auxiliary samples are stored in two separate memory banks of the same corresponding memory at the respective storage addresses associated with the two initial samples of rank k and of rank Nxe2x88x92k. The elementary Fourier transform processing of the first stage of the graph is performed successively on the N/r groups of auxiliary samples respectively stored at the same address in the r memory banks of the other corresponding memory.
More particularly, when the memory banks are indexed from 0 to rxe2x88x921 and all are addressable at addresses lying between 0 and N/rxe2x88x921, and if the pairing indication associated with an initial sample of rank k is representative of the first received, this sample is stored in the memory bank whose index is equal to E[rk/N]. The symbol E denotes the integer part operator at the address k modulo N/r.
However, if the pairing indication associated with an initial sample of rank k is representative of the second received, and the paired initial sample stored in the memory bank of index equal to E[r(Nxe2x88x92k)/N] at the address (Nxe2x88x92k) modulo N/r is read from the memory, then the auxiliary sample of rank k is formulated and stored in the memory bank of index equal to E[rk/N] at the address k modulo N/r. The auxiliary sample of rank Nxe2x88x92k is also formulated and stored in the memory bank of index equal to E[r(Nxe2x88x92k)/N] at the address (Nxe2x88x92k) modulo N/r.
The subject of the invention is also a device for transforming a stream of initial complex symbols respectively formed of N initial complex samples into a stream of respective groups of 2N real output data. This device comprises transformation means of the interleaved type comprising preprocessing means able to formulate for each initial symbol received an auxiliary symbol formed of N auxiliary complex samples. The device further comprises processing means able to perform for each auxiliary symbol an inverse Fourier transform calculation of size N including elementary processing of the butterfly type corresponding to several stages of a general butterfly-like calculation graph. The 2N real output data corresponding to the initial symbol received is delivered.
According to a general characteristic of the invention, the processing means are of a pipelined architecture and the device comprises two separate random access memories. On receipt of an initial symbol, the preprocessing means are able to store in a first memory the auxiliary symbol corresponding to this initial symbol. Simultaneously, the elementary processing means of the first stage of the processing means with pipelined architecture are able to perform on the basis of the content of the second memory the elementary processing corresponding to the first stage of the graph. The device furthermore comprises control means able to swap access to the two memories by the elementary processing means with each new receipt of an initial symbol.
According to a relatively straightforward embodiment of the invention, the two memories are single access random access memories of identical structure. The preprocessing means are able to formulate and to store a pair of the auxiliary complex samples in the corresponding memory.
According to one embodiment of the invention, the preprocessing means are able to formulate the auxiliary sample of rank k on the basis of a pair of paired initial samples which is formed by the initial sample of rank k within the initial symbol and of the initial sample of rank Nxe2x88x92k. If r denotes the radix of the inverse elementary Fourier transform associated with the first stage of the graph, each of the two memories is subdivided into r independent memory banks of identical size equal to N/r. The memory banks are indexed from 0 to rxe2x88x921 and all are addressable by addresses lying between 0 and N/rxe2x88x921. With each initial sample received there are associated a rank indication representative of its rank within the symbol and a pairing indication indicating whether the initial sample received is the first received or the second received of the corresponding pair.
If the pairing indication associated with an initial sample of rank k is representative of the first received, the preprocessing means are able to store this sample in the memory bank of index equal to E[rk/N] at the address k modulo N/r. If the pairing indication associated with an initial sample of rank k is representative of the second received, the preprocessing means are able to extract from the memory the paired initial sample stored in the memory bank of index equal to E[r(Nxe2x88x92k)/N] at the address (Nxe2x88x92k) modulo N/r.
The preprocessing means are thereafter able to formulate the auxiliary sample of rank k and to store it in the memory bank of index equal to E[rk/N] at the address k modulo N/r. The auxiliary sample of rank Nxe2x88x92k is formulated and stored in the memory bank of index equal to E[r(Nxe2x88x92k)/N] at the address (Nxe2x88x92k) modulo N/r.
The elementary processing means successively performs processing of butterfly type on the N/r groups of auxiliary samples respectively stored at the same address in the r memory banks of the corresponding memory.