In the fields of High Performance Computing (HPC), high parallelization is desired to make full use of hardware performance as the number of computing nodes and the number of Central Processing Unit (CPU) cores increase. Especially, in a multithreaded environment in a shared memory system, it is possible to make full use of hardware performance by creating as many threads as the number of CPU cores and binding the threads to the CPU cores in a one-to-one fashion.
In general, in parallelizing a nested loop in a shared memory system, the parallelization of the outermost loop achieves a reduction in parallelization cost and is therefore efficient. However, if the number of iterations of the outermost loop is less than the number of CPU cores, it is not possible to make full use of the hardware performance because the parallelization of the outermost loop does not result in using all the CPU cores. To deal with this, a technique of converting the nested loop into a single loop, expanding the iteration space of the loop, and then performing parallelization is employed.
There may be cases where access to a multidimensional array within a nested loop is contiguous in the memory. In such cases, a computation expression using a loop control variable after conversion to a single loop is created to obtain subscripts of the multidimensional array so that the multidimensional array is accessed as if it were like a one-dimensional array. This approach makes it possible to perform Single Instruction Multiple Data vectorization (SIMDization) of the processing. The subscripts of the multidimensional array are numerical values each indicating the position of an element in the multidimensional array. Hereinafter, the computation expression for computing the subscripts is referred to as a subscript expression. The SIMDization is to generate an instruction (SIMD instruction) to achieve parallel processing by executing a single instruction on a plurality of data items at the same time. The SIMDization to generate SIMD instructions at the time of compiling a program improves the processing efficiency.
Even if access to the multidimensional array is not contiguous in the memory, it is possible to perform the SIMDization such as to generate an SIMD instruction with masks. The SIMD instruction with masks uses masks to separate portions to be subjected to computation from portions not to be subjected to the computation. The values (true or false) of the masks for respective elements to be accessed are represented as a mask array.
For example, as a technique of converting a nested loop into a single loop, there has been considered a compiling method that achieves an acceleration of vector operation processing with mask data, which is involved in conversion of a nested loop into a single loop and loop combining at the compiling time. In addition, for the case where a loop for computation defines arrays in different dimensions or of different sizes, there has been considered a technique that achieves an accelerated computation of the arrays by reducing the number of loops.
Please see, for example, Japanese Laid-open Patent Publication Nos. 11-242598 and 11-203273.
Consider the case of using an SIMD instruction with masks. If masks are prepared for respective ones of all elements to be accessed, the data amount of the masks increases with an increase in the amount of data to be accessed. If the data amount of masks is excess, large memory capacity is consumed to store the masks, which causes a decrease in the processing efficiency of the system.
To deal with this, there is an attempt to reduce the data amount of masks. For example, in the case where elements to be subjected to computation and elements not to be subjected to the computation appear in a fixed repetitive pattern, a mask pattern of small size corresponding to a single repetition of the pattern is prepared and its masks are used repeatedly. If it is possible to repeatedly use the mask pattern of small size, the data amount of masks is reduced. However, in order to use the mask pattern of small size repeatedly, a complicated expression may be needed to specify masks. However, a complicated expression is not usable to specify masks in an SIMD instruction with masks.
As described above, in the case where access to a multidimensional array is not contiguous in a memory, a complicated subscript expression for the mask array needs to be used to reduce the data amount of masks at the time of SIMDization, which ends up being unable to use SIMD instructions with masks.