1. Field of the Invention
The present invention relates to a method for Fast Fourier Transform on a computer system provided with a vector functional unit, and particularly relates to a Fast Fourier Transform method characterized by its rearrangement process where elements with certain element numbers and other elements with numbers obtainable by reversing the bits of the above element numbers exchange their positions.
2. Description of the Prior Art
FIG. 4A is a diagram to show the procedure and data flow for execution of ISOGEOMETRIC type fast Fourier transform. FIG. 4B is a diagram to illustrate the expression method used in FIG. 4A.
The letter n is the number of data. Referring to the figure, the circles on the ends of each stage indicate the data and its order (position order) on the memory, the left end circles represent the input data and the right end circles indicate the output data.
In the first stage, the vector V1 consisting of the data at Positions 1 to 4 and the vector V2 consisting of the data at Positions 5 to 8 are subjected to vector addition V1+V2, and the results are stored at Positions 1 to 7 with intervals of one position (shown with solid line in the figure). Vector subtraction V1-V2 is also calculated, and the results are, after they have been multiplied by the rotation factor, stored at Positions 2 to 8 (with intervals of one position). In stages 2 and 3, similar processing is performed for vectors V1 and V2, with the rotation factor updated for each stage.
In ISOGEOMETRIC type Fast Fourier Transform shown in FIG. 4A, data needs to be converted at the last or first stage of the processing as shown in FORTRAN statement of (1) below(Reverse binary expression). In other words, elements of Fast Fourier Transform are required to have their positions exchanged with those occupied by the elements having bit reversed element numbers. EQU DO 10 I=1,N EQU 10 A(I)=B(IND(I)) (1)
where N is the size of the data to be rearranged, an array B is the data to be rearranged (input), and an array A is the rearranged data (output). The array IND is the data to define the rearrangement. The array IND(I) has values obtained by expressing (I-1) in binary notation, reversing their bits and then adding 1. This is called "Reverse binary expression". FIG. 5 shows a case where N=32.
It is proposed to execute data rearrangement as described above using a vector functional unit. FIG. 6 is a block diagram to show the configuration outline of a vector functional unit. In the figure, the reference numeral 10 indicates a main memory, 2-1 to 2-4 indicate memory access controllers, 30 is a vector register and 40 is an operation pipeline. An actual main memory 10 has, for example, 512 banks, but the main memory 10 in the figure is provided with four banks only for easier understanding. The memory access controller 2-i (i may be 1, 2, 3 or 4) reads data out of the bank i and writes data to the bank i. The vector register 30 has a plurality of element storing areas.
FIG. 7 is a diagram to illustrate the processing for reverse binary expression of the formula (1) using a vector operation device. The array B existing on the main memory 10 is read out according to IND (I), and loaded to the vector register 30 in order. The first element of the array B is written to the first element storing area, and the 17th element of the array B is stored to the second element storing area of the vector register 30. Thereafter, other elements are also stored according to FIG. 5. After loading the array B to the vector register 30, the vector data in the vector register 30 are stored to the area assigned to the array A in the main memory 10. The I-th element of the vector register 30 is written to the I-th element storing area of the array A.
In a conventional rearrangement for reverse binary expression as above, there occurs severe competitions of banks (bank conflicts) during the loading of input data from the main memory 10 to the vector register 30.
Consider a vector functional unit with four banks. It will have memory bank conflicts as shown in FIG. 8 when N=32. In FIG. 8, an element data has a width of 8 bytes, and a bank has a width of 8 bytes. Other circles mean that the applicable bank is hit. In the example of FIG. 8, each bank is hit for eight times. Frequency of memory bank conflicts is in proportion to the increase in the number of data N, and in inverse proportion to the increase in the number of banks. The frequency of memory bank conflicts IB is expressed by the formula below. EQU IB=N/(IC.times.BANK)
where N is the vector length which can be expressed as 2.sup.n, BANK is the number of banks in the memory, and IC is the value obtained by dividing the bank width by the element data width. Such memory bank conflicts result in deteriorated performance of the ISOGEOMETRIC type Fast Fourier Transform. This is why the ISOGEOMETRIC type Fast Fourier Transform is said to be an algorithm not suitable for a vector functional unit.
FIGS. 9A and 9B illustrate a Fast Fourier Transform example where 16 elements 0 to 15 are rearranged through exchanging the positions of the elements having bit reversed element numbers. FIG. 9A shows an original array and FIG. 9B shows an array after element rearrangement. Elements are rearranged by non-linear vector loading/storing.
Specifically, in order to rearrange the elements 0 to 15 whose element numbers are 0 to 15 in FIG. 9A through exchanging the positions of the elements having bit reversed element numbers, the bit reversed numbers 0000, 1000, 0100, . . . 0111, 1111 corresponding to the element numbers 0000, 0001, 0010, . . . . 1110-1111 are stored in an array, which is used in non-linear vector loading and linear staring for a new array as shown in FIG. 9B.
As shown in FIG. 9B, vectoring is used in the rearrangement of the elements having bit reversed element numbers through non-linear vector loading and linear storing.
Some computer systems are provided with exclusive instructions for determination of bit reversed element numbers.
Thus, processing for reverse binary expression of the elements in Fast Fourier Transform tends to cause, due to bank conflicts, a deteriorated performance of the entire Fast Fourier Transform. Besides, element rearrangement using non-linear vector loading and linear storing can be made only at a low speed and is susceptible to bank conflicts, which impairs the processing efficiency of Fast Fourier Transform. In addition, in the case of a system provided with exclusive instructions for determination of bit reversed element numbers, the hardware costs much.