Various transforms are used in a large variety of data processing applications, such as digital signal processing of received signals in devices compliant with radio access standards using Orthogonal Frequency Division Modulation (OFDM), for example UMTS LTE (Universal Mobile Telecommunications System Long Term Evolution). Various trans-forms may also be used, for example, in signal analysis, compression algorithms, and filtering. Furthermore, trans-forms may be applicable for use in devices compliant to other standards than UMTS LTE, such as UMTS (Universal Mobile Telecommunications), GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), and DVB (Digital Video Broadcasting). Computing a transform may be a relatively computationally complex data processing task, and therefore it is often crucial to the overall processing performance that the transform computation is optimized as far as possible.
It is well known in the art that it is possible to reduce a size-N transform, e.g. a Fourier transform, to log2 N calculation stages, each comprising N/2 size-2 transforms if N is a power of 2. This rearrangement of the computations reduces the order of the problem from N2 to Nlog2 N. In the case of the Fourier transform, the resulting computation method is denoted the fast Fourier trans-form (FFT).
A transform is often represented as a so-called butterfly diagram. An example butterfly diagram will be described later on in connection to FIG. 2A. A butterfly diagram comprises a plurality of computation kernels. An example of such a computation kernel 100 is given in FIG. 1. In the figure, it may be seen that a computation kernel can be drawn such that it resembles a butterfly.
The two pieces of input data to the butterfly calculation performed in the computation kernel 100 are denoted x1 and x2 and are supplied at inputs 101 and 102, respectively. The two pieces of output data from the butterfly calculation are denoted y1 and y2 and may be found at the outputs 103 and 104, respectively. The output data value at output 103 of this example butterfly computation is attained by adding, in adder 105, the input data value at input 101 to the input data value at input 102. The output data value at output 104 is attained by changing the sign of the input data value at input 102 in multiplier 107, adding, in adder 106, the input data value at input 101 to the result of multiplier 107, and multiplying, in multiplier 108, the result of adder 106 by a so called twiddle factor (TF).
It is emphasized that the realization of the computation kernel shown in FIG. 1 merely represents one example among many possible various realizations. As an example variation, the multiplications may be performed by other means than multipliers. The multiplications may be performed according to any method presently known, such as by shift and add operations, or according to any method for multiplication that will be discovered in the future.
An entire transform computation is made up of (N/2)log2 N such butterflies, and this is exemplified in FIG. 2A, where a size-8 FFT-computation 200 is shown. The depiction of the butterflies has been simplified in FIG. 2A, so that the figure will not be too cluttered. It should be understood, however, that the adders and multipliers, for example as shown in FIG. 1, are also present in the implementation of FIG. 2A.
It is to be noted that, in FIG. 2A, there are 3 (=log2 8) computation stages 201, 202, 203, which each consists of 4 (=8/2) butterfly calculations. Hence, the execution of a transform computation may be represented as a nested loop with log2 N iterations of an outer loop (one iteration per transform stage) and N/2 iterations of an inner loop (one iteration per butterfly calculation). The execution of a transform computation may alternatively be represented as a single loop with log2 N iterations (one iteration per transform stage) and the N/2 butterfly calculations being computed in parallel. In FIG. 2A, the intermediate results of the transform computation are stored in buffers 204, 205, 206 and the final result of the transform computation is stored in buffer 207. Some implementations use two buffers altogether to fulfill this task. The two buffers alternate, through the different stages of the transform computation, in serving as read buffer and write buffer. Yet some implementations, so-called in-place transforms, use only a single buffer. The addresses 208 of the different locations in the buffers 204, 205, 206, 207 are given in both decimal representation and bit representation for clarity. Also shown in FIG. 2A, are exemplary read indices 209, 210, 211 that are used to determine which buffer address to read from during each read cycle. For example, according to the indices and addresses given in FIG. 2A, the first butterfly calculation (corresponding to the read indices 209 with values 0 and 1) of the first stage should use input data from buffer addresses 0 and 4 (and write the result of the butterfly calculation to addresses 0 and 4 of the output buffer), and the second butterfly calculation (corresponding to the read indices 209 with values 2 and 3) of the first stage should use input data from buffer addresses 2 and 6. To further exemplify, the first butterfly calculation (corresponding to the read indices 210 with values 0 and 1) of the second stage should, according to FIG. 2A, use input data from buffer addresses 0 and 2, and the second butterfly calculation (corresponding to the read indices 210 with values 2 and 3) of the second stage should use input data from buffer addresses 4 and 6.
It is well known in the art that the input of the transform computation such as the one shown in FIG. 2A should be read in bit-reversed order from the input buffer 204. It can be seen in the figure that the bit representation of the read indices 209 of the first stage if reversed corresponds to the bit representation of the buffer addresses 208. Several variations exist. For example, the input of the transform computation may be read in natural order from the input buffer 204. This however, would result in an implementation that is different from the one shown in FIG. 2A.
One problem with existing solutions for transform computation is time consumption. For example, for each butterfly computation, two read accesses are required to fetch two pieces of input data, and two write accesses are required to store the result, i.e. two pieces of output data. This generally requires at least four cycles for each butterfly computation. Since the dimensions of a transform may be quite large (N in the order of 1024-8192 is common), each transform computation involves a large amount of butterfly calculations. Hence, even though the execution of the butterfly calculations are often pipelined, the execution time of each butterfly calculation is crucial to the overall computation time of a transform.
Some implementations use buffers that can handle two read and/or two write accesses in a single cycle, e.g. buffers that comprise two or more address inputs, so-called two-port or dual-port memories. Even though this may reduce the overall computation time, these types of buffers are generally more expensive than buffers with only one address input, and therefore such a solution is not always preferable.
Other implementations may separate each buffer into two or more memory banks, which also enables two read and/or two write accesses in a single cycle. Such solutions generally result in increased complexity compared to solutions where the buffers are not separated into memory banks.
Thus, there is a need for data-processing implementations and methods of transform computations that reduce the overall computation time. This should preferably be achieved without increasing the implementation cost and complexity at all or to a moderate increase in implementation cost and/or complexity.