High performance complex FFT algorithms require quantities of RAM to allow parallel data input, processing and output. Alternative low memory algorithms result in much lower performance, either due to reformatting of data prior to processing, or because they do not offer parallel operation of input, processing and output. Such algorithms are used in modems for digital communications, for instance, in a VDSL (very high-speed digital subscriber line) modem in which it is important to perform FFT and inverse-FFT (IFFT) processes in real time.
A conventional method for rapid FFT processing uses three banks of memory which, for each transform (FFT or IFFT) operation, act respectively as an input memory bank for loading data samples, as a processing memory bank for use in processing the data samples, and as an output memory bank for delivery of the transformed data samples. The architecture of a system for performing such a method is shown in the block diagram of FIG. 1A. FIGS. 1B, 1C and 1D are related block diagrams showing the passage of data between the system in successive transform operations. FIGS. 1E and 1F are is an associated timing and memory management diagrams.
Referring to FIGS. 1A to 1E, incoming data samples are passed from a RAM input interface 20 to one of three banks 22, 24, 26 of RAM according to the FFT operation being performed, via decode logic 28. In a first time period t1 (FIG. 1B), the incoming data samples are passed to RAM bank 22 (RAM 1), whereas in time periods t2 (FIG. 1C) and t3 (FIG. 1D), incoming samples are passed to RAM banks 24 (RAM 2) and 26 (RAM 3) respectively. In each FFT operation, received samples which have been stored in RAM are passed to a dedicated internal processing engine 30 which performs successive ‘butterfly’ operations to implement the FFT algorithm, the number of butterfly operations depending on the number and size of samples to be processed. Accordingly, in time period t2, samples received in RAM 22 in time period t1 are read by processing engine 30, processed, and written back to the same RAM 22. Concurrently, new data samples are being loaded into RAM 26, as shown in FIGS. 1C and 1E. In time period t3, the processed samples in RAM 22 are read to the RAM output interface 32 whilst the input samples loaded into RAM 26 are processed by the processing engine 30 and further new data samples are loaded into RAM 24, as shown in FIGS. 1D and 1E. It will be seen that the functions of loading, processing and delivery are rotated between the three banks of RAM 22, 24, 26 from FFT operation to operation, each acting successively as an input RAM, processing RAM and output RAM.
In the example shown in the drawings, an N point 16-bit FFT is performed and, in order to increase computational accuracy, a 24-bit processor is used. In each bank of RAM there are three N×16-bit RAM instances, making N×48-bit of RAM in each bank, as shown in FIG. 1A. The first instance is used for the real component of the data, the second instance for the complex component, and the third instance for sign extension to 24 bits.
An alternative illustration of the memory management scheme described above is given by the diagram of FIG. 1F. Each line 34 in FIG. 1F represents the different functions of one N×16-bit RAM instance referred to above. For each such line 34, there are three N×16-bit RAM instance and, at any instant in time, one is used for data input, one for processing and one for data output. Typically, one line is used for the real component of data, the second line for the complex component, and the third for sign extension.
In an alternative known method, less memory is used but performance is poorer because data samples are loaded into an input memory and pre-sorted into a processing memory before processing begins and because data needs to be post-sorted into an output memory. This is illustrated in FIGS. 2A to 2D. The system architecture in this case has a first N×16-bit input RAM 40, a second N×48-bit processing RAM 42 and a third N×16-bit output RAM 44. As shown in FIGS. 2B, 2C and 2D, the three RAMs 40, 42 and 44 are each dedicated to their particular function in that there is no rotation between functions. Between each input, processing and output step there is a loading and/or unloading step in which the pre- and post-sorting takes place. This adds significantly to the time taken to complete processing from receipt of samples via the RAM input interface 20 to the delivery of transformed data samples via the RAM output interface 32.
It will be noted from FIG. 2A that, in this example, the input and output RAMs are only N×16-bit RAMs. This is because the complex component in the received data samples is 0 and the bits for sign extension are redundant, and because in the output data, which contains both real and complex components, only samples 0 to N/2 are unique. From (N/2+1) to (N−1) the samples are the complex conjugates of previous samples. When loading data for an N-point IFFT with a real output, only the samples 0 to N/2 are unique. Again, the input samples from (N/2+1) to (N−1) are simply complex conjugates of the previous samples.
It is an object of the present invention to provide a method and a system which combine the advantages of speed and reduced memory requirement.