1. Field of the Invention
The present invention relates to a method for processing at a high speed a three-dimensional Fourier transform using a shared memory scalar parallel computer.
2. Description of the Related Art
Recently, a shared memory scalar parallel computer has been widely used in various fields of scientific technological calculation especially in a fluid analysis, astronomic physics, a weather forecast, a collision analysis, an image analysis, etc. When a shared memory scalar parallel computer is used in these fields, the fast performance of a multidimensional Fourier transform is demanded to quickly process a large-scale problem.
Additionally, with the progress of computer technology, the architecture of a scalar CPU of a shared memory scalar parallel computer has been changed.
In an Itanium 2 processor adopted by PRIMEQUES, 128 floating point registers are provided. The cache and memory access time is relatively high. The Fourier transform method developed for Sparc, etc. has been a method for configuring a multidimensional Fourier transform based on a fast one-dimensional transform. A Sparc system has a 2-stage cache, that is, cache L1 and L2, and the access speed of the cache L2is faster than that of the cache L1. Therefore, the calculating method by holding data in the relatively small cache L1 is advantageous.
Refer to the non-patent document 1 for the explanation of a common fast Fourier transform.
[Non-patent Document 1] Charles Van Loan, “Computational Frameworks for the Fast Fourier Transform”, Society for Industrial and Applied Mathematics, 1992
The Fourier transform method developed for the Sparc is not sufficient to improve to the utmost the performance of the high-performance CPU of a relatively fast machine especially in continuous access of memory access cache of a large capacity with sufficient registers. Furthermore, since the number of registers is relatively small in the Sparc, there occurs the problem that processing is delayed due to the insufficient registers in using the method of improving the arithmetic density using a larger radix and loading the memory access before the timing required in various types of calculation.
It is important to find a method of high Fourier transform in a computer system having a CPU that has a relatively large number of registers such as the Itanium 2 processor, holds much data in the registers, performs high-speed calculation, performs fast continuous access to the data of memory cache, and has a large capacity of cache.