The invention relates to the field of electronic computing devices, and, more particularly, to an electronic device having a pipeline architecture for computing a Fourier transform, and a related method.
Numerous dedicated Fourier transform implementations, including those programmed on signal processing microprocessors, have been disclosed. Most of these implementations use a variation of the Cooley-Tukey algorithm, which makes it possible to reduce the number of arithmetic operations required for computing the Fourier transform. This algorithm is well known to one skilled in the art.
In particular, the Cooley-Tukey algorithm reduces the computation of a fast Fourier transform of initial size rp into that of r Fourier transforms of size rpxe2x88x921, and of supplementary complex multiplications and additions. According to the terminology customarily used by one skilled in the art, r represents the radix. Iterative repetition of this reduction produces the computation of Fourier transforms of size r. These computations can easily be carried out, especially if r is chosen equal to 2 or 4. The Cooley-Tukey algorithm uses a computation graph that takes on the appearance of a structure of a general butterfly shape, and is commonly referred to simply as a butterfly. This appearance is well known to one skilled in the art
Several hardware architectures are possible for implementing a butterfly-shaped computation structure. A first approach includes a hardware operator capable of performing a butterfly type computation per butterfly of the graph. However, such an approach may be used only for the implementation of Fourier transforms of small size.
A second approach includes just a single hardware operator of the butterfly type, and performing in succession the computations corresponding to all the butterflies of all the stages of the graph. Such an approach has the drawback of requiring a very fast hardware operator. An input memory separate from the memory is required for writing the intermediate computation results. This avoids access conflicts when a data block enters the operator while the previous block is still being processed. It is therefore necessary to provide two memories of N0 complex words, where N0 denotes the initial size of the Fourier transform. This leads to an overall circuit of considerable size, especially when N0 is large.
An intermediate approach includes a hardware operator of the butterfly type per stage of the graph, as well as a storage element. This includes delay lines or shift registers, whose function is to input the data to the operator in the right order while aware of the butterflies of the graph of the relevant stage. Such architectures are termed serial or pipeline according to terminology well known by one skilled in the art.
More precisely, an electronic device for computing a Fourier transform having a pipeline architecture comprises a plurality of successive processing stages connected in series between the input and the output of the device by internal data paths. These stages respectively comprise processing means and storage means. The processing means performs processing operations for Fourier transforms of smaller elementary sizes than the initial size on blocks of data whose sizes are reduced in succession from one stage to the next.
The term initial size of the Fourier transform is understood here and in the remainder of the text to mean the size of the blocks received as input to the device by the first stage. The elementary sizes of the Fourier transforms performed by the various stages may be identical and equal to the radix of the Fourier transform; i.e., a Fourier transform with uniform radix. However, they may be different from one stage to another, as in the case of Fourier transforms with mixed radix.
Examples of pipeline architectures are described in an article by Bi and Jones, entitled xe2x80x9cA Pipeline FFT Processor for Word-Sequential Dataxe2x80x9d, IEEE Transactions on Acoustic Speech and Signal Processing, vol. 37, No. 12, December 1989, pages 1982-1985, and in an article by Bidget et al., entitled xe2x80x9cA Fast Single-Chip Implementation of 8192 Complex Point FFTxe2x80x9d, IEEE Journal of Solid-State Circuits, vol. 30, No. 3, March 1995, pages 300-305.
The storage means described in these known architectures includes delay lines which are very simple elements to manage. They have the advantage of being generally compact, and use three transistors per stored bit. However, these elements are not always available as standard cells in ordinary libraries of components used in defining and designing integrated circuits. Furthermore, their electrical characteristics are dependent on the technology used, so that the architecture of the circuit must be carefully re-examined each time the technology advances. Such architectures use delay lines whose storage capacity is equal to 2N0 for an initial size of a Fourier transform equal to N0, while the minimum theoretical storage capacity is equal to N0.
The invention provides a different approach to the above described problem. An object of the invention is to provide a device having a pipelined architecture for computing a Fourier transform. The device operates with very high clock frequencies while minimizing the memory size required, which may equal the theoretical minimum. Another object of the invention is to provide such a device using conventional and readily available storage elements, regardless of the implemented technology.
Yet another object of the invention is to provide an electronic device for computing a Fourier transform capable of being easily tested with full scan test methods, which are well known to one skilled in the art. Another object of the invention is to take account of any guard interval separating the various symbols to be processed by Fourier transform, especially in terrestrial applications of digital television which use OFDM (Orthogonal Frequency Division Multiplex) coding for transmission.
The invention therefore provides an electronic device having a pipelined architecture for computing a Fourier transform. The electronic device comprises a plurality of successive processing stages connected in series between the input and the output of the device. These stages respectively comprise processing means and storage means. The processing means performs processing operations for Fourier transforms of smaller elementary sizes than the initial size on blocks of data whose sizes are reduced in succession from one stage to the next.
The electronic device comprises at least one radix 4 processing stage. The radix 4 processing stage includes elementary processing means performing processing operations for Fourier transforms of elementary size equal to 4 on blocks of data. An elementary storage means comprising a random access memory is also included in the radix 4 processing stage. In particular, the random access memory comprises a single-access static memory.
The use of a random access memory, whether dual-access (dual port) or single-access (single port), requires specific management for addressing so the intermediate data in the memory can be stored and redelivered in the right order. This management is more complex when the radix of the Fourier transform is greater than 2, and in particular, when it is equal to 4. The single access permits either write-access or read-access at each cycle of the internal clock of the device. This approach goes against all current teachings on the subject, which provides for the use of delay lines or shift registers.
The use of random access memories enables the storage capacity to be reduced stage by stage. Therefore, the total storage capacity of the device is reduced relative to the storage capacity required when using delay lines. Such components are more readily available, particularly, in their simplest form, i.e., a single-access static memory. Random access memories are independent of the technology used, and are compatible with very high clock frequencies.
Various internal hardware architectures for the elementary processing means may be used for implementing the processing operations of the butterfly type within each stage. However, it is preferable for the elementary processing means of the radix 4 stage to respectively perform N/4 processing operations of the butterfly type on N/4 distinct groups of four data bits of each data block processed by this stage. The size of each data block equals N.
The elementary processing means make provisions to call each datum (or operand) of the block received once only to perform the various processing operations of the butterfly type. This process is distinguishable from the hardware operator used in the previously discussed article by Bidget et al., where the later makes provisions to call each operand several times to perform the processing operations. The elementary processing means of a radix 4 stage comprises eight complex adders and one multiplier. In the prior art, delay lines are used and provisions are made for only six adders and one multiplier. The elementary processing means, according to the present invention, stores fewer intermediate data and contribute, in combination with the use of a random access memory, to further minimize the stage-by-stage storage capacity.
According to one embodiment of the invention, the storage capacity of a radix 4 processing stage is equal to 3N/4 data bits, with N being the size of the data blocks processed by this processing stage. In other words, the invention makes provisions to store in each stage only three quarters of the data received by this stage. This provides a total storage capacity for the device equal to N0, with N0 being the initial size of the blocks processed by the first stage of the device, i.e., the initial size of the Fourier transform. There is a factor of 2 savings in storage capacity using a radix 4 processing stage compared to he prior art devices which use delay lines.
A problem associated with devices that compute Fourier transforms is the dynamic range of the intermediate and output data with respect to the dynamic range of the input data. The term dynamic range is understood to mean the number of bits used to represent the data, including the sign bit. Since the hardware operators of the butterfly type perform complex multiplications and additions, it is unrealistic to save multiplication after multiplication. As a result, it is customary to work with a constant dynamic range. A constant dynamic range is provided by representing the input, the intermediate and the output data using the same number of bits.
Although the dynamic range is constant, the value of the dynamic range of the intermediate data cannot be known in advance. The value of the dynamic range of a datum refers to the range of values within which the datum lies, e.g., between xe2x88x920.5 and +0.5 or between xe2x88x920.05 and +0.05 etc. A first approach includes a priori globally extending the dynamic range of the data. That is, the necessary dynamic range is estimated a priori over the data output by the circuit to not lose too much accuracy in the significant bits. This assumes that no saturation occurs with regard to the internal computation, and the size of the input data words are subsequently increased by the estimated number of extra bits. The intermediate data and the output data will also be represented with words of this size. Accordingly, this leads to an increase in the size of the internal data paths of the circuit.
When the initial size of the Fourier transform is not too large, it is possible to use all the radix 4 processing stages of the device, each having a storage capacity equal to three quarters of the data received by the corresponding stage. When the initial size of the Fourier transform is large, the a priori estimation of the dynamic range required may lead to an overly large increase in the size of the internal data paths. This requires numerous processing stages, which results in an increase in the area of the circuit. Therefore, it is advantageous to provide radix 4 processing stages of a second type when the initial size of the Fourier transform is large. The second type processing stage has elementary processing means comprising means for determining the dynamic range of the data of each block processed, and for performing a realignment of these data with regard to the dynamic range. In operation, this realignment involves estimating the maximum value of the data of the block, and in dividing each datum of the block by this maximum value. However, to perform such a realignment of the data, it is necessary for the radix 4 processing stages of the second type to comprise elementary storage means having a storage capacity equal to N. N is the size of the data blocks processed by this stage.
Although the storage capacity of the radix 4 stage of a second type is greater than the storage capacity of a radix 4 stage of a first type, the storage capacity equal to N nevertheless remains less than that of a radix 4 stage of the prior art using delay lines. The prior art also carries out a realignment of the data, such as described in the article by Bidget et al. No realignment of the data is performed in the first type radix 4 stage. In the multistage devices for computing Fourier transforms, the first stage, i.e., the input stage, does not generally comprise any means for realigning the data since it is generally assumed that the incoming data are already correctly aligned. However, in certain applications requiring very high accuracy in the data, it is possible to use the means for realigning the data actually being incorporated into the first stage.
Regardless of the type of the radix 4 stage, it is possible for the elementary storage means of this stage to consist entirely of a random access memory. However, it is particularly advantageous to associate with this random access memory one or more levels of registers or latches mutually connected in series with the memory. This separates the memory from the operative part of the stage, and allows for the use of automatic tools for generating test vectors. Such automatic test methods are referred to as full scan methods, and are well known to one skilled in the art. These automatic test methods include loading all the latches, and in performing computations, and in rewriting the data to the latches to carry out the test.
It is particularly advantageous with respect to a radix 4 processing stage of the first type (i.e., without realignment of the data) for the elementary storage means to comprise a single-access random access memory and n registers mutually connected in series with the memory. The memory is then able to store N/4xe2x88x92(nxe2x88x921) words of three data bits, while each register is able to store one word of three data bits. For the processing stages of the second type (i.e., with realignment of the data), the elementary storage means also comprises a single-access random access memory and n registers mutually connected in series with the memory. However, the memory is then able to store N/4xe2x88x92(nxe2x88x921) words of four data bits while each register is able to store one word of four data bits.
According to one embodiment of the invention and regardless of the type of stage, each radix 4 processing stage comprises an input for sequentially receiving at the frequency of a first clock signal the N data bits of a current block. The data is ordered within four consecutive segments each containing N/4 data bits. Each datum of a segment forms a group of four data bits together with the counterpart data bit of the other three segments. The elementary processing means of the stage comprises an adder/subtracter module for performing, at each cycle of the first clock signal, a processing operation of the butterfly type on each of the groups formed. This processing operation derives successive groups of four intermediate data respectively ordered within four consecutive intermediate segments. The elementary processing means furthermore comprises a multiplier module for multipling, at each cycle of the first clock signal, the intermediate data by predetermined multiplier coefficients. The processing stage also comprises control means for delivering to the elementary storage means the data contained in at least the first three segments of the current block as they are received. The control means are also able to respectively substitute some of the stored data of the current block with the intermediate data contained in the last three intermediate segments. At each cycle of the first clock signal, the control means also redeliver to the elementary storage means the information removed from the storage means and not used by the adder/subtracter module or the multiplier module.
The control means in a processing stage of the first type (i.e., no realignment of the data) deliver to the elementary storage means the data contained in the first three segments as they are received. The data contained in the last segment are not stored. The control means also respectively substitute the stored data with the intermediate data contained in the last three intermediate segments as the data contained in the fourth segment are received.
The control means in a processing stage of the second type (i.e., realignment of the data) deliver to the elementary storage means the data contained in the four segments of the current block as they are received. The control means also respectively substitute the stored data of the last three segments of the current block with the intermediate data contained in the last three intermediate segments as the data contained in the first segment of the next block are received.
According to another embodiment of the invention, the elementary storage means comprises a first register connected to the output of the memory, and a second register connected to the input of the memory. This embodiment is regardless of the type of radix 4 processing stage. The output of the first register is connected firstly to the input of the second register by a first controllable multiplexer, secondly to the input of the adder/subtracter module, and thirdly to the input of the multiplier module by a second controllable multiplexer. The output of the adder/subtracter module is connected to the input of the first register by the first multiplexer, and to the input of the multiplier module by the second multiplexer. Therefore, the control means comprises two multiplexers, as well as a first counter modulo N (write counter). The first counter modulo N is clocked by the first clock signal, and reinitializes on reception of the first datum of each block. The first counter modulo N also controls the first multiplexer. The control means also comprises a second counter modulo N (read counter) clocked by the first clock signal, and reinitializes on transmission of the first output datum of the stage. The second counter modulo N also controls the second multiplexer. Furthermore, the elementary processing means of the stage comprises means for addressing the memory comprising a counter modulo N/4xe2x88x921, i.e., N/4xe2x88x92(nxe2x88x921) with n=2 for the two registers.
The invention also provides a process for controlling a radix 4 processing stage for computing a Fourier transform of a device having a pipelined architecture. For each block of data received as input to the stage, only three quarters of the data of the block are stored in storage means comprising a random access memory.
According to yet another embodiment of the invention, the stage sequentially receives the N data bits of the block. The data are ordered within four consecutive segments each comprising N/4 data bits. Each datum of a segment forms a group of four data bits together with the counterpart data of the other three segments. The data contained in the first three segments are stored in the storage means as they are received. As the data contained in the fourth segment are received, a processing operation of the butterfly type is performed on each of the groups to derive successive groups of four intermediate data bits respectively ordered within four consecutive intermediate segments. The stored data are replaced respectively with the intermediate data contained in the last three intermediate segments.