Most programs written for computers contain loops and nests of loops for performing repetitive operations on sequences of data. These programs direct that operations be done in a well-defined order. Because single-instruction, single-data path (SISD) machines have historically been the most pervasive type of machines, the order is one that is readily executable on such an SISD machine. This very same order may not be valid on a single-instruction, multiple-data path (SIMD) machine, where successive elements are taken in that order and are operated upon independently in parallel. Other orders may exist, however, in which the elements validly may be taken and operated upon independently in parallel. There exists, therefore, a requirement to ascertain those portions of programs and those orderings of operations for which the SIMD machine may be used to execute the repetitive operations. This requirement and its concomitant result is known as vectorization. It is thus desired to start with a program written for a scalar SISD machine and to produce object code for execution on a vector SIMD machine.
Vectorization is of interest for at least three reasons. First is the large existing body of programming written for SISD machines. An automatic vectorizer makes it possible for the existing programs to be executed on an SIMD machine without requiring that the programs be rewritten. Second is portability. An automatic vectorizer makes it possible to execute the same program, without change, on both SISD and SIMD machines. The third is speed. An automatic vectorizer renders it feasible for the same program to be optimized for scalar when it is run on an SISD machine and to be optimized for vector when it is run on an SIMD machine.
The objective of vectorization is to find sequential scalar operations that can be converted to equivalent parallel vector operations in order to take advantage of vector SIMD machines. Operations are equivalent if, using them, the program produces the same result.
In general, a statement may be vectorized if it does not require, as an input on one iteration of a loop, a value it computed on an earlier iteration of the loop. If no value computed in one iteration of a DO-loop is used in a later iteration, then all of the data values can be computed in parallel. This independence of data values from one DO-loop iteration to the next is a factor in allowing execution on an SIMD machine. In contrast, if a value is computed in one iteration and then used in a later iteration, the DO-loop cannot in general be vectorized.
Vectorization of sequential operations requires that the dependences in the program be determined. A statement in a source program may be dependent upon another statement in the program because the flow of control is affected by one of them or because they both use the same storage locations. Both kinds of dependences must be considered when the program is analyzed for vectorization.
Among the most pertinent prior art references are those by Allen, "Dependence Analysis for Subscripted Variables and Its Application to Program Transformations", Ph.D. thesis, Computer Science, Rice University, April 1983; and Kennedy, "Automatic Translation of FORTRAN Programs to Vector Form", Rice University Technical Report 476-029-4, October 1980. The aforementioned Allen and Kennedy references describe the scalar-to-vector conversion steps of (a) forming a dependence graph, (b) partitioning the graph into topologically significant regions, (c) examining the regions in order to ascertain which regions are executable on an SIMD machine, and (d) generating scalar or vector code based on the results of (c).
In the prior art, recurrences among program statements in selected ones of a nest of loops prevented vectorization of those program statements in the loops which generated the recurrence, and furthermore prevented vectorization of those program statements in the outer loops containing those loops. It is known that the interchange of the order of loops affects vectorizability of statements in the loops. However, because of the complexity and expense involved in identification and interchange of the loops, this has been sparingly employed only on the innermost and next-to-innermost loops.