Fast Fourier transform (hereinafter, referred to as FFT) is a technique for performing discrete Fourier transform at a high speed, and three-dimensional FFT is an important calculation technique which is used to analyze various physical problems. As the performance of a parallel computer system such as a super computer becomes higher, a technique for performing three-dimensional FFT at a high speed has attracted attention.
In recent parallel computer systems, a large number of central processing units (CPUs) are operated in parallel in order to improve the performance of the parallel computer system, and a process on each CPU advances processing while exchanging numerical data with other processes to thereby complete the overall processing. In a case where the parallel computer system is made to perform three-dimensional FFT, a three-dimensional array (for example, data designated by a user, and referred to as a global array) to be calculated is divided, and local arrays created by the division are allocated to processes. As a typical division method, one-axis distribution (also referred to as slab-decomposition), two-axis distribution (also referred to as column wise-decomposition), and three-axis distribution (also referred to as volumetric decomposition) are known.
The division of a global array has restrictions. For example, it is preferable that the number of elements to be factored in each axial direction may be a relatively small prime factor such as approximately 2, 3, 5, or 7, in terms of algorithm of discrete Fourier transform. In addition, it is preferable that the number of elements in each axial direction is divided by the number of processes. Consequently, in the related art, for example, the number of elements and the number of processes in each axial direction are set to a power of 2. However, there is a case where the number of elements or the number of processes has to be set to a number which is not a power of 2. In addition, in a case where a data type is not a complex number type but a real number type, the number of first dimensional elements is not set to a power of 2. Accordingly, it is not preferable that the number of elements and the number of processes are limited to only a power of 2.
In addition, the number of discrete points to be used in scientific and technological calculation has been diversified, and a parallel computer system has been complicated. Accordingly, the adoption of the above-mentioned setting may result in a situation where some CPUs are not used.
On the other hand, the relaxation of restrictions on division (that is, allowance of various numbers of elements and processes) often brings about a case where the numbers of elements in respective processes are not equal to each other. Particularly, in the above-mentioned two-axis distribution and three-axis distribution, data communication for returning the arrangement of elements to the original arrangement is performed due to a change in the number of elements included in each process in the middle of calculation. In a case of parallel distributed processing of three-dimensional FFT, a ratio of a time used for data communication between processes to the entire processing time is large, and an increase in the amount of data communication results in the deterioration of performance of the parallel distributed processing. The related art is not focused on such problems. Japanese Laid-open Patent Publication No. 2000-200261 and Japanese Laid-open Patent Publication No. 2004-348493 are examples of related art.
In addition, examples of related art include
(1) M. Eleftheriou, B. G. Fitch, A. Rayshubskiy, T. J. C. Ward, R. S. Germain, “Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements”, IBM Journal of Research and Development, IBM, 2005, 49, 457-464,
(2) M. Eleftheriou, B. G. Fitch, A. Rayshubskiy, T. J. C. Ward, R. S. Germain, “Performance measurements of the 3d FFT on the Blue Gene/L supercomputer”, Euro-Par 2005 Parallel Processing, Springer, 2005, 795-803,
(3) Roland Schulz, “3D FFT with 2D decomposition”, CS project report, 2008, [searched on Mar. 30, 2015], the Internet,
(4) Ning Li, Sylvain Laizet, “2DECOMP &FFT-A Highly Scalable 2DDecomposition Library and FFT Interface”, Cray User Group 2010 conference, 2010, 1-13, [searched on Mar. 30, 2015], the Internet,
(5) Daisuke Takahashi, “FFTE: A Fast Fourier Transform Package”, [searched on Mar. 30, 2015], the Internet,
(6) T. V. T. Duy, T. Ozaki, “A decomposition method with minimum communication amount for parallelization of multi-dimensional FFTs”, CoRR abs/1302.6189, 2013, [searched on Mar. 30, 2015], the Internet, and
(7) “OpenFFT An open Source Parallel Package for 3-D FFTs”, [searched on the Internet at URL=http://www.openmx-square.org/openfft/]
In one aspect, an embodiment aims to provide a technique for achieving both the relaxation of restrictions on the division of a three-dimensional array and an improvement in performance, in a case where a parallel computer system performs three-dimensional FFT on the three-dimensional array.