1. Field of the Invention
The present invention relates to a method for controlling a memory and an operational device using the method. More specifically, the present invention relates to a method of using, for example, in an operational device which performs, on hardware or circuitry, processing such as fast Fourier transform (FFT) or inverse fast Fourier transform (IFFT), a sort of the fast Fourier transform, an operational means for performing FFT or IFFT, and a memory control means disposed between the operational means and a plurality of memory banks for storing data used for operation, the memory control means controlling to which memory bank data is written in order to save the storage capacity of the memory required for performing FFT or IFFT processing.
2. Description of the Background Art
Recent communication processing often uses orthogonal frequency division multiplexing (OFDM). More specifically, the OFDM is used for transmission schemes such as a ground-wave digital broadcasting, a wireless local area network (LAN) under IEEE (Institute of Electric and Electronics Engineering) 802.11a and 802.11g, etc., and a power line communication MODEM.
The OFDM processing mainly occupies FFT or IFFT processing. It is thus required for the OFDM processing to be implemented with a small fast operating FFT or IFFT circuit. In general, the FFT or IFFT processing is implemented on hardware in order to accomplish fast processing. The FFT is outlined, for example, in an article, Takuya Ooura “Rough Note on the Fast Fourier Transform,” pp. 1-3, (Online), Research Institute for Mathematical Sciences, Kyoto University, (searched on Jun. 3, 2005), on the Internet, http://www.kurims.kyoto-u.ac.jp/˜ooura/fftman/fft_note_s.pdf.
As described in this document, the discrete Fourier transform (DFT) on one or more (N) points needs N2 computations whereas the FFT needs only computations in proportion to an expression, Nlog N. The FFT has the basic principle based on an idea that a simple conversion of suffixes can resolve the large sized DFT into small sized DFTs which provide easier computations. For example, consider now the computation of the following expression (1) for DFT on N points.
                                          A            k                    =                                    ∑                              j                =                0                                            N                -                1                                      ⁢                                          a                j                            ⁢                              W                N                jk                                                    ,                              W            N                    =                      ⅇ                                          -                2                            ⁢              π              ⁢                                                          ⁢                              i                /                N                                                                        (        1        )            
In this case, each term from A0 to AN-1 is computed with N multiplications, thereby needing a total of N2 multiplications. If the number N is divisible by two, the suffix k can be classified into even and odd numbers to resolve the DFT on N point into two expressions (2) and (3), each for DFT on N/2 points.
                              A                      2            ⁢            k                          =                              ∑                          j              =              0                                                      N                /                2                            -              1                                ⁢                                    (                                                a                  j                                +                                  a                                                            N                      /                      2                                        +                    j                                                              )                        ⁢                          W                              N                /                2                            jk                                                          (        2        )                                          A                                    2              ⁢              k                        +            1                          =                              ∑                          j              =              0                                                      N                /                2                            -              1                                ⁢                                    (                                                a                  j                                -                                  a                                                            N                      /                      2                                        +                    j                                                              )                        ⁢                          W              N              j                        ⁢                          W                              N                /                2                            jk                                                          (        3        )            
The expressions (2) and (3) for the DFT on N/2 points can each be computed with N2/4 multiplications. The resolving can thus reduce the amount of computations to about half. Repeating the resolving twice or three times will reduce the amount of computations to about one-fourth or one-eighth. This is the basic idea of the Cooley Tukey FFT, i. e., radix 2 frequency-decimated Cooley-Tukey FFT.
FIG. 5 shows a data flow of the radix 2 frequency-decimated FFT described in FIG. 1 of the document, Takuya Ooura, stated above. Consider now the computation amount when the resolving into the expressions (2) and (3) is done a number of times equal to log2N to ultimately provide the DFT on one point. This resolving itself requires N/2 times of the complex number multiplication by multiplying WjN at each stage and N times of the complex number addition. The number of times of the complex number multiplication is thus reduced to a number equal to (N/2) log2 N. The amount of the floating point operation is thus in the order of Nlog2 N. This is a typical value of the amount of operation with the Cooley Tukey FFT. Several algorithms for reducing the amount of FFT operation are basically to reduce the proportionality constant of this order and a term with a lower order than the value of Nlog2 N.
A description will now be given on a general resolving method using the FFT suffix. Assume, for example, that N can be factorized to N=N1*N2. The suffix j in the expression (1) is replaced by the two suffixes j1 (=0, 1, 2, . . . , N1−1) and j2 (=0, 1, 2, . . . , N2−1). For certain natural numbers j1 and j2, a mapping for converting the j1 and j2 to j is defined by the following expression (4).j≡(J1j1+J2j2)mod N  (4)
First, the mapping of the expression (4) must be of one-to-one correspondence. This requires necessary and sufficient conditions (a) and (b) for certain natural numbers p and q.    (a) When N1 and N2 are relatively prime, at least one of J1=pN2 and J2=qN1 is satisfied, and gcd (J1, N1)=gcd (J2, N2)=1.    (b) When N1 and N2 are not relatively prime,J1=pN2, and J2 mod N1≢0, and gcd(p, N1)=gcd(J2, N2)=1orJ1 mod N2≢0, and J2=qN1, and gcd(J1, N1)=gcd(q, N2)=1Further, the similar mapping is defined for the suffix k by the expression (5).k≡(K1k1+K2k2)mod N  (5)
Applying the above conversions to the expression (1) gives the following expression (6).
                                                                        A                                                                            K                      1                                        ⁢                                          k                      1                                                        +                                                            K                      2                                        ⁢                                          k                      2                                                                                  =                            ⁢                                                ∑                                                            j                      2                                        =                    0                                                                              N                      2                                        -                    1                                                  ⁢                                                      ∑                                                                  j                        1                                            =                      0                                                                                      N                        1                                            -                      1                                                        ⁢                                                            a                                                                                                    J                            1                                                    ⁢                                                      j                            1                                                                          +                                                                              J                            2                                                    ⁢                                                      j                            2                                                                                                                ⁢                                          W                      N                                                                        J                          1                                                ⁢                                                  K                          1                                                ⁢                                                  j                          1                                                ⁢                                                  k                          1                                                                                                                                                                                                          ⁢                                                W                  N                                                            J                      1                                        ⁢                                          K                      2                                        ⁢                                          j                      1                                        ⁢                                          k                      2                                                                      ⁢                                  W                  N                                                            J                      2                                        ⁢                                          K                      1                                        ⁢                                          j                      2                                        ⁢                                          k                      1                                                                      ⁢                                  W                  N                                                            J                      2                                        ⁢                                          K                      2                                        ⁢                                          j                      2                                        ⁢                                          k                      2                                                                                                                              (        6        )            The second and third terms of W in the expression (6) prevent the change of the operation block order in the expression (6), thereby preventing the resolving into the small DFTs. It is seen that if at least one of the conditions J1K2 and J2K1 as defined by the following expression (7) is satisfied, the expression (6) can be resolved into two small DFTs of N1 and N2.J1K2≡0 mod N or J2K1≡0 mod N  (7)
Examples satisfying the condition of the expression (7) may be the following two types of the resolvings, (i) and (ii).    (i) When N1 and N2 are relatively prime, J1=N2, and J2=N1, and K1=N2, and K2=N1.    (ii) When N1 and N2 are arbitrary numbers, J1=N2, and J2=1, and K1=1, and K2=N1, or J1=1, and J2=N1, and K1=N2, and K2=1.
The first type of resolving is used only when values N1 and N2 are relatively prime. This resolving eliminates the two terms of W in the expression (6) to resolve the expression (6) into bi-dimensional DFT of N1 and N2. This resolving requires selecting the values N1 and N2 which are relatively prime. However, no amount of operation is required for the resolving. Remaining unresolved DFT has its length generally corresponding to a prime number, which needs a certain amount of computation. The FFT with this resolving is used in the prime factor FFT [5, 2, 7] and Winograd DFT algorithm [5, 9].
The second type of resolving can select any values N1 and N2. This type of resolving, however, eliminates only one term of W in the expression (6). The resolving thus requires a multiplication of W (twiddle factor multiplication) for the resolving into DFT of values N1 and N2. The value N1 or N2 can be fixed, however, to a number which the DFT can easily calculate, so that the amount of computation except for the resolving is reduced. The FFT with this resolving is the Cooley-Tukey FFT[3]. Its basic algorithm is that the value N1 is fixed and the resolving is recursively repeated. The value N1 is called “radix”. The elimination of two terms of W in the expression (6) is called the decimation-in-frequency algorithm. The elimination of three terms of W is called the decimation-in-time algorithm. The Cooley-Tukey FFT has many types, such as the usual Radix-2 FFT, Arbitrary-Radix FFT, Mixed-Radix FFT, and Split-Radix FET which is supposed to have reduced amount of operation [4, 6, 8]. The foregoing is the outline of the FFT described in the reference document.
Different types of FFT or IFFT processing are used depending on the radix, such as Radix 2 (radix equal to 2), Radix 4 (radix equal to 4), and Radix 8 (radix equal to 8). The Radix 4 is often used because it requires about 75% as much in the amount of operation as the Radix 2 to process the same amount of data.
For the FFT or IFFT processing, when performed in parallel, items of data simultaneously required depend on a processing radix in such a manner as two items of the complex data for Radix 2 and four items of the complex data for Radix 4. To provide the data simultaneously, the memory area need to be divided into a plurality of banks, and the data simultaneously used need to be stored in different memory banks.
FIG. 6 is a schematic block diagram of a conventional Radix FFT operational device. The Radix 4 FFT operational device shown performs the Radix 4 FFT processing in hardware. The operational device includes an Radix 4 FFT operational circuit 1, and a memory 10, such as Random Access Memory (RAM), which provides complex data to the operational circuit 1. The memory 10 is divided into four memory banks 11-1 to 11-4. These memory banks 11-1 to 11-4 have address generators (adr-gen) 12-1 to 12-4 respectively connected thereto. The address generators respectively provide access addresses to the memory banks. The four memory banks 11-1 to 11-4 can provide four pieces of complex data to the FFT operational circuit 1 simultaneously. The four memory banks can also receive four pieces of complex data simultaneously.
FIG. 7 shows the operational flow of a conventional Radix 2 and Radix 4 FFT or IFFT processing. FIG. 7 shows an example for the data number of 42 (=a0 to a15). Note that a solid line shows an addition path, and a dotted line shows a subtraction path in FIG. 7.
Attention is now paid to data used simultaneously in the FFT processing or IFFT processing. IFFT will be done in a similar manner to the FFT processing, so that for simplicity only the FFT processing will be described below. A description will then be given to the FFT processing (1) for Radix 2 and FFT processing (2) for Radix 4 .
(1) FFT Processing for Radix 2
One time of the FFT processing shown in the flow in FIG. 7 (Radix 2 FFT basic operation processing) is categorized, from left to right, into a first-stage processing T1, a second-stage processing T2, a third-stage processing T3, and a fourth-stage processing T4. In synchronous with a clock signal not shown, the processing proceeds from the first stage T1 to the fourth stage T4.
The first-stage processing T1 uses the data (a0, a8), (a1, a9), (a2, a10), (a3, a11), (a4, a12), (a5, a13), (a6, a14), and (a7, a15) simultaneously. More specifically, for the first-stage processing T1, the same memory bank (for example, 11-1) needs to store the data a0, a1, a2, a3, a4, a5, a6, and a7, and another memory bank (for example, 11-2) needs to store the data a8, a9, a10, all, a12, a13, a14, and a15.
The second-stage processing T2 uses the data (a0, a4), (a1, a5), (a2, a6), (a3, a7), (a8, a12), (a9, a13), (a10, a14), and (a11, a15) simultaneously. Specifically, for the second-stage processing T2, the same memory bank (for example, 11-3) needs to store the data a0, a1, a2, a3, a8, a9, a10, and all, and another memory bank (for example, 11-4) needs to store the data a4, a5, a6, a7, a12, a13, a14, and a15.
The first-stage processing T1 provides the resulting data a0 and a8 simultaneously. Taking into account of the use in the second-stage processing T2, however, the same memory bank needs to store the data a0 and a8. It was thus necessary to use a plurality of clocks to change the locations in which the data are stored, which prevented the fast processing.
(2) FFT Processing for Radix 4
One time of the FFT processing shown in the FFT flow graph (Radix 4 FFT basic operation processing) is categorized, from left to right, into a first-stage processing T10 and a second-stage processing T20.
The first-stage processing T10 uses the data (a0, a4, a8, a12), (a1, a5, a9, a13), (a2, a6, a10, a14), and (a3, a7, a11, a18) simultaneously. Specifically, for the first-stage processing T10 the memory bank 11-1 needs to store the data a0, a1, a2, a3, the memory bank 11-2 needs to store the data a4, a5, a6, a7, the memory bank 11-3 needs to store the data a8, a9, a10, a11, and the memory bank 11-4 needs to store the data a12, a13, a14, and a15.
The second-stage processing T20 uses the data (a0, a1, a2, a3), (a4, a5, a6, a7), (a8, a9, a10, a11), (a12, a13, a14, a15) simultaneously. Specifically, for the second-stage processing T20 the memory bank 11-1 needs to store the data a0, a4, a8, a12, the memory bank 11-2 needs to store the data a1, a5, a9, a13, the memory bank 11-3 needs to store the data a2, a6, a10, a14, and the memory bank 11-4 needs to store the data a3, a7, a11, and a15.
Whereas the first-stage processing T10 provides the resulting data a0, a4, a8, and a12 simultaneously. Taking into account of the use in the second-stage processing T20, the same memory bank needs to store the data a0, a4, a8, and a12. It was thus necessary to use a plurality of clocks to change the locations in which the data are stored, which prevented the fast processing.
With reference to FIG. 8, the above problems will be described in more detail. FIG. 8 shows an example of a conventional Radix 4 FFT processing (the number of data items 45=1024, data a0 to a1023).
For the Radix 4 FFT on 1024 points that is achieved with four memory banks 11-1 to 11-4, consideration will be given to determining in which of the memory banks 11-1 to 11-4 the data a0 to a1023 are stored.
(1) The Input of the FFT First-Stage Processing T10
    memory bank 11-1: 0, 1, 2, 3, 4, 5, . . . , 255    memory bank 11-2: 256, 257, 258, 259, . . . , 511    memory bank 11-3: 512, 513, 514, 515, . . . , 767    memory bank 11-4: 768, 769, 770, 771, . . . , 1023(2) The Input of the FFT Second-Stage Processing T20    memory bank 11-1: 0, 1, . . . , 63, 256, 257, . . . , 319, 512,    513, . . . , 575, 768, 769, . . . , 831    memory bank 11-2: 64, 65, . . . , 127, 320, 321, . . . , 383, 576, 577, . . . , 639, 832, 833, . . . , 895    memory bank 11-3: 128, 129, . . . , 191, 384, 385, . . . , 447, 640, 641, . . . , 703, 896, 897, . . . , 959    memory bank 11-3: 192, 193, . . . , 255, 448, 449, . . . , 511, 704, 705, . . . , 767, 960, 961, . . . , 1023(3) The Input of FFT Third-Stage Processing T30    memory bank 11-1: 0, 1, . . . , 15, 64, 65, . . . , 79, 128, 129, . . . , 143, . . . 960, 961, . . . , 975    memory bank 11-2: 16, 17, . . . , 31, 80, 81, . . . , 95, 144, 145, . . . , 159, . . . 976, 977, . . . , 991    memory bank 11-3: 32, 33, . . . , 47, 96, 97, . . . , 111, 160, 161, . . . , 175, . . . 992, 993, . . . , 1007    memory bank 11-3: 48, 49, . . . , 63, 112, 113, . . . , 127, 176, 177, . . . , 191, . . . 1008, 1009, . . . , 1023(4) The Input of FFT Fourth-Stage Processing T40    memory bank 11-1: 0, 1, 2, 3, 16, 17, 18, 19, 32, 33, 34, . . . , 35, . . . 1008, 1009, 1010, 1011    memory bank 11-2: 4,5, 6, 7, 20, 21, 22, 23, 36, 37, 38, 39, . . . , 1012, 1013, 1014, 1015    memory bank 11-3: 8, 9, 10, 11, 24, 25, 26, 27, 40, 41, 42, 43, . . . 1016, 1017, 1018, 1019    memory bank 11-3: 12, 13, 14, 15, 28, 29, 30, 31, 44, 45, 46, 47, . . . 1020, 1021, 1022, 1023(5) The Input of FFT Fifth-Stage Processing T50    memory bank 11-1: 0, 4, 8, 12, . . . 1008, 10I2, 1016, 1020    memory bank 11-2: 1, 5, 9, 13, . . . 1009, 1013, 1017, 1021    memory bank 11-3: 2, 6, 10, 14, . . . 1010, 1014, 1018, 1022    memory bank 11-3: 3, 7, 11, 15, . . . 1011, 1015, 1019, 1023.
As seen from the above, just performing the FFT in the order of data a0 to a1023 cannot provide the data in the next-stage FET processing. More specifically, the data a0 to a255 stored in the memory bank 11-1 in the FFT first-stage processing T10 is stored in the four memory banks 11-1 to 11-4 with each having 64 memories in the FFT second-stage processing T20.