Large-scale simulations each handling up to several millions of variables are required in scientific and technical computations such as calculations to find characteristics of a semiconductor device, calculations to determine states of electrons and calculations to forecast the weather. As a means for dealing with such large-scale problems, a parallel-processing computer, specially, a parallel-processing computer having the so-called distributed-memory configuration is powerful. The parallel-processing computer having a distributed-memory configuration is a system comprising a plurality of processors connected to each other by a network as processors each having its own memory. In comparison with the conventional sequential-processing computer, the parallel-processing computer parallel-processing computer having a distributed-memory configuration offers an advantage of allowing the peak performance thereof to be raised to as high a level as desired by increasing the number of processors employed therein.
In the parallel-processing computer having a distributed-memory configuration, pieces of data serving as an object of calculation are stored in memories distributed among the processors so that the processors are capable of carrying out computations on the pieces of data in parallel processing. If a specific one of the processor requires data owned by another processor in the course of processing, the specific processor must wait for the required data to be transferred from the other processor before continuing the processing. Thus, in general, the parallel-processing computer having a distributed-memory configuration incurs an overhead of time required for transferring data from one processor to another in addition to the processing time. For this reason, in order to increase the efficiency of computation, it is necessary to adopt a computation method exhibiting such a high degree of processing parallelism that computation can be done by incurring only a shortest possible period of time required for communication between processors. In addition, a large number of parallel-processing computers having a distributed-memory configuration includes a mechanism, which is used for transferring data from a specific one of the processors to another processor while the specific processor is processing other data. In this configuration, if it is possible to contrive a computation method capable of carrying out processing of data and transfers of other data at the same time, the time it takes to transfer other data can be concealed behind the processing time so that the efficiency of computation can be raised.
The Fourier transformation is one of processes carried out frequently in a scientific calculation. The Fourier transformation is a process of expressing a function f(x) having complex values defined in an interval of real numbers as a superposition of a complex exponential function exp(ikx). In an implementation on a computer, only a finite number of handled points can be handled so that the Fourier transformation becomes a process of expressing a series of points f0, f1, . . . , fN-1 each representing a complex number as a superposition of N complex exponential functions exp (2πikj/N) where symbol k represents every integer in the range 0, 1, . . . , (N−1), symbol i denotes the imaginary-number unit and symbol π denotes the ratio of the circumference of a circle to the diameter thereof as follows:exp(2πikj/N)fj=Σk=0N-1ck exp(2πikj/N)where symbol j represents every integer in the range 0, 1, . . . , (N−1). That is to say, for the given f0, f1, . . . , fN−1, the Fourier transformation is a process of finding superposition coefficients c0, c1, . . . , cN−1. As commonly known, the superposition coefficients c0, c1, . . . , cN−1 can be found from the following equation:ck=(1/N) Σj=0N-1fj exp(−2πikj/N)where symbol k represents every integer in the range 0, 1, . . . , (N−1). If the calculation is carried out on the basis of the above definitions, however, N equations each comprising N terms must be solved. Thus, in addition to calculation of the complex exponential functions exp (−2πikj/N), additions and multiplications of complex numbers must be carried out N2 times. In order to solve this problem of much calculation, in actuality, a technique known as a fast Fourier transformation is adopted widely. The fast Fourier transformation is a technique for reducing the amount of computation to an order of NlogN by devising an algorithm for the Fourier transformation. The fast Fourier transformation is described in detail in documents such as a reference authored by G. Golub and C. F. van Loan with a title of “Matrix Computations”, 3rd edition, published by The John Hopkins University Press, 1996, pp. 189-192.
The Fourier transformation described above is called a 1-dimensional Fourier transformation. However, a 3-dimensional Fourier transformation is applied to computations such as the calculations to find characteristics of a semiconductor device, the calculations to determine states of electrons and the calculations to forecast the weather. The 3-dimensional Fourier transformation is a process to express complex-number data {fjx, jy, jz} having 3 subscripts jx, jy and jz where symbol jx represents every integer in the range 0, 1, . . . , (Nx−1), symbol jy represents every integer in the range 0, 1, . . . , (Ny−1) and symbol jz represents every integer in the range 0, 1, . . . , (Nz−1) as a superposition of Nx×Ny×Nz complex exponential functions exp(−2πikxjx/Nx) exp(−2πikyjy/Ny) exp(−2πikzjz/Nz) where symbol kx represents every integer in the range 0, 1, . . . , (Nx−1), symbol ky represents every integer in the range 0, 1, . . . , (Ny−1) and symbol kz represents every integer in the range 0, 1, . . . , (Nz−1) as follows:fjx, jy, jz=Σkx=0Nx-1Σky=0Ny-1Σkz=0Nz-1 ckx, ky, kz exp(−2πikxjx/Nx) exp(−2πikyjy/Ny) exp(−2πikzjz/Nz)where symbol jx represents every integer in the range 0, 1, . . . , (Nx−1), symbol jy represents every integer in the range 0, 1, . . . , (Ny−1) and symbol jz represents every integer in the range 0, 1, . . . , (Nz−1). That is to say, for the given {fjx, jy, jz,}, the 3-dimensional Fourier transformation is a process of finding a superposition coefficient {Ckx, ky, kz}. As commonly known, the superposition coefficient {Ckx, ky, kz} can be found from the following equation:ckx, ky, kz=Σjx=0Nx-1Σjy=0Ny-1Σjz=0Nz-1 fjx, jy, jz exp(−2πikxjx/Nx) exp(−2πikyjy/Ny) exp(−2πikzjz/Nz)where symbol kx represents every integer in the range 0, 1, . . . , (Nx−1), symbol ky represents every integer in the range 0, 1, . . . , (Ny−1) and symbol kz represents every integer in the range 0, 1, . . . , (Nz−1).
Furthermore, it is easy to show that the above equation can be solved by sequentially carrying out the following three transformations:
<Transformation in the Y Direction>cjx, ky, jz(1)=Σjy=0Ny-1 fjx, jy, jz exp(−2πikyjy/Ny)where symbol jx represents every integer in the range 0, 1, . . . , (Nx−1), symbol ky represents every integer in the range 0, 1, . . . , (Ny−1) and symbol jz represents every integer in the range 0, 1, . . . , (Nz−1).<Transformation in the X Direction>ckx, ky, jz(2)=Σjx=0Nx-1cjx, ky, jz(1) exp(−2πikxjx/Nx)where symbol kx represents every integer in the range 0, 1, . . . , (Nx−1) symbol ky represents every integer in the range 0, 1, . . . , (Ny−1) and symbol jz represents every integer in the range 0, 1, . . . , (Nz−1).<Transformation in the Z Direction>Ckx, ky, kz=Σjz=0Nz-1ckx, ky, jz(2) exp(−2πikzjz/Nz)where symbol kx represents every integer in the range 0, 1, . . . , (Nx−1), symbol ky represents every integer in the range 0, 1, . . . , (Ny−1) and symbol kz represents every integer in the range 0, 1, . . . , (Nz−1).
As is obvious from the above equations, the transformation in the Y direction is a 1-dimensional Fourier transformation carried out on Ny pieces of data having the same subscripts jx and jz. Then, subscripts jx and jz are varied, being used in carrying out such a transformation Nx×Nz times in order to complete the transformation in the Y direction. The transformations in the X and Z directions are carried out in the same way as the transformation in the Y direction. Thus, as indicated by reference numeral 1 shown in FIG. 2, if pieces of 3-dimensional data {fjx, jy, jz} are arranged to form a rectangular solid with dimensions of Nx×Ny×Nz where symbols Nx, Ny and Nz denote the lengths of its sides, the transformation in the Y direction is a 1-dimensional Fourier transformation carried out on Ny pieces of data 2, which are parallel to the Y axis. By the same token, the transformation in the X direction is a 1-dimensional Fourier transformation carried out on Nx pieces of data 3, which are parallel to the X axis. Likewise, the transformation in the Z direction is a 1-dimensional Fourier transformation carried out on Nz pieces of data 4, which are parallel to the Z axis. It is obvious that, by adoption of this method of computation, in the transformation in the Y direction, calculations for sets of data with different X coordinates or different Z coordinates can be carried out concurrently. By the same token, it is also obvious that, in the transformation in the X direction, calculations for sets of data with different Y coordinates or different Z coordinates can be carried out concurrently. Similarly, it is obvious as well that, in the transformation in the Z direction, calculations for sets of data with different X coordinates or different Y coordinates can be carried out concurrently.
Traditionally, a method utilizing the parallelism described above is generally adopted in execution of the 3-dimensional fast Fourier transformation using a parallel-processing computer having a distributed-memory configuration. An example of such a method is referred to as a permutation algorithm, which is an efficient technique of reducing the amount of data transferred between processors to a minimum. This efficient technique is described in detail in documents such as a reference authored by V. Kumar, A. Grama, A. Gupta and G. Karypis with a title of “Introduction to Parallel Computing”, published by The Benjamin/Cummings Publishing Company, 1994, pp. 377-406. In accordance with this method, as shown in FIG. 3, first of all, 3-dimensional data is split into as many pieces of data 5 each arranged on a plane perpendicular to the Z axis as processors, and the pieces of data 5 are each stored in a memory provided for one of the processors in a distributed-memory configuration. Then, in this state, a transformation in the Y direction is carried out. It is obvious that, since only 1 processor has all Ny pieces of data 2 required in the transformation in the Y direction in itself in this state, the transformation in the Y direction can be carried out without the need to transfer data between processors. After the transformation in the Y direction is completed, the technique of splitting data is changed. This time, the 3-dimensional data is split into as many pieces of data 6 each arranged on a plane perpendicular to the Y axis as processors, and the pieces of data 6 are each stored in a memory provided for one of the processors in a distributed-memory configuration. In consequence, every processor needs to carry out a process to transfer data to all other processors. This process is referred to as permutation. After the permutation process is completed, however, each processor has all Nx pieces of data 3 required in the transformation in the X direction in itself. Thus, the transformation in the X direction can be carried out without the need to transfer data between processors. In addition, also in the case of the transformation in the Z direction, each processor has all Nz pieces of data 4 required in the transformation in the Z direction in itself. Thus, the transformation in the Z direction can be carried out without the need to transfer data between processors. In this way, the 3-dimensional Fourier transformation can be completed. The above description explains the use of a parallel-processing computer having a distributed-memory configuration to implement a method of carrying out the 3-dimensional fast Fourier transformation.
In accordance with the parallel computing method based on the permutation algorithm described above, the transformations in the Y, X and Z directions can be carried out in processors in a completely independent way. In the permutation process carried out in the course of computing, however, every processor needs to transfer data to all other processors. In general, in a parallel-processing computer having a distributed-memory configuration, it takes much time to transfer data in comparison with the processing time itself. This phenomenon has been becoming obvious more and more as the processing speed of the contemporary processor is increased. In addition, in recent years, PC clusters are widely used. A PC cluster is a number of personal computers (PCs) connected to each other by using a network such as the Internet (a registered trademark). In the case of a PC cluster, the power to transfer data among the personal computers is low in comparison with a parallel-processing computer having a distributed-memory configuration. Thus, in particular, the power to transfer data among the personal computers most likely becomes a bottleneck of the processing time. As is obvious from the background described above, in many cases, the conventional method based on the permutation algorithm does not assure sufficient parallel-processing performance in the use of a parallel-processing computer having a distributed-memory configuration for execution of the 3-dimensional fast Fourier transformation. It is thus an object of the present invention to solve this problem.