In order to perform an arithmetic operation at high speed by using a vector arithmetic operation unit or a single instruction multiple data (SIMD) arithmetic operation unit, it is important to vectorize software (a program) in such a way that a vector arithmetic operation instruction is capable of being effectively applied. Vectorization of software refers to performing processing of extracting a common arithmetic operation from loop processing (iteration processing) described by a program targeted for vectorization, or increasing a loop length (the number of iterations) regarding loop processing, and the like.
While software often includes multi-loop processing such as dual-loop processing in general, vectorization regarding multi-loop processing is complex in processing thereof as compared with vectorization regarding single-loop (one-loop) processing. Therefore, expectations for a technique that vectorizes multi-loop processing are growing.
As one example of such a technique, PTL 1 discloses a compile scheme including a parsing unit, a structure analyzing unit, a data dependency relation analyzing unit, a loop switch analyzing unit, a vector text generating unit, and a code generating unit. In order to vectorize dual-loop processing, this compile scheme performs processing of analyzing the dual-loop processing and thus switching an outer loop and an inner loop in a dual loop.
FIG. 21 illustrates an example in which a general vectorization device including the technique described by PTL 1 vectorizes dual-loop processing by switching an inner loop and an outer loop in the dual-loop processing. As illustrated in FIG. 21(a), an original program to be vectorized by this vectorization device includes dual-loop processing in which a loop length of an outer loop is 10000 and a loop length of an inner loop is 10. In this case, because a vector arithmetic operation unit which executes the program illustrated in FIG. 21(a) performs vector arithmetic operations by the inner loop having a loop length of 10 10000 times, it cannot be said that efficiency of a vector arithmetic operation is satisfactory. Therefore, this vectorization device generates a program illustrated in FIG. 21(b) in which the inner loop and the outer loop in the original program are switched. Because a vector arithmetic operation unit which executes the program illustrated in FIG. 21(b) needs only to perform vector arithmetic operations by the inner loop having a loop length of 10000 10 times, efficiency of a vector arithmetic operation is improved.
Furthermore, PTL 2 discloses a vectorization device including a dimension-mismatching array detection means, a dimension-mismatching array duplication means, an expansion means, and a vectorization implementation means. This vectorization device detects array variables having different numbers of dimensions with respect to a plurality of array variables used by an arithmetic equation included in multi-loop processing in a program. This vectorization device performs processing of converting a multiple loop into a single loop after equalizing sizes of arrays with respect to the detected array variables.
FIG. 22 illustrates an example in which a general vectorization device including the technique described by PTL 2 vectorizes dual-loop processing by converting a dual loop into a single loop. As illustrated in FIG. 22(a), an original program to be vectorized by this vectorization device uses two-dimensional array variables X and Y having an array size of 100×100 (x represents a multiplication in the present application), and a one-dimensional array variable Z having an array size of 100. This vectorization device expands the array variable Z to a two-dimensional array having an array size of 100×100 after detecting that the array size of the array variable Z is different from those of the array variables X and Y. Then, this vectorization device generates a program illustrated in FIG. 22(b) in which the array variables X, Y, and Z are converted into a one-dimensional array having a size of 10000 from a two-dimensional array having a size of 100×100.
Furthermore, PTL 3 discloses a vectorization processing scheme of a compiler, capable of vectorizing dual-loop processing even when a loop length of an inner loop in the dual-loop processing is not a fixed value and is dependent on a value of a loop variable of an outer loop.
FIG. 23 illustrates an example of a program vectorized by a general vectorization device including the technique described by PTL 3. NI and NJ in FIG. 23 are a variable representing a loop length of an outer loop, and an array variable representing a loop length of an inner loop, in order. The program illustrated in FIG. 23 is a program in which the loop length of the inner loop is not a fixed value and is dependent on a value of the loop variable of the outer loop. This vectorization device acquires a maximum value that can be taken by the loop length of the inner loop dependent on the loop variable of the outer loop, and replaces the loop length of the inner loop with the maximum value. Then, this vectorization device vectorizes dual-loop processing in which the loop length of the inner loop is converted into a fixed value.