Supercomputers are high performance computing platforms that employ a pipelined vector processing approach to solving numerical problems. Vectors are ordered sets of data. Problems that can be structured as a sequence of operations on vectors can experience one to two orders of magnitude increased throughput when executed on a vector machine (compared to execution on a scalar machine of the same cost). Pipelining further increases throughput by hiding memory latency through the prefetching of instructions and data.
A pipelined vector machine is disclosed in U.S. Pat. No. 4,128,880, issued Dec. 5, 1978, to Cray, the disclosure of which is hereby incorporated herein by reference. In the Cray machine, vectors are processed by loading them into operand vector registers, streaming them through a data processing pipeline having a functional unit, and receiving the output in a result vector register. A vector machine according to U.S. Pat. No. 4,128,880 supports fully parallel operation by allowing multiple pipelines to execute concurrently on independent streams of data.
For vectorizable problems, vector processing is faster and more efficient than scalar processing. Overhead associated with maintenance of the loop-control variable (for example, incrementing and checking the count) is reduced. In addition, central memory conflicts are reduced (fewer but bigger requests) and data processing units are used more efficiently (through data streaming).
Vector processing supercomputers are used for a variety of large-scale numerical problems. Applications typically are highly structured computations that model physical processes. They exhibit a heavy dependence on floating-point arithmetic due to the potentially large dynamic range of values within these computations. Problems requiring modeling of heat or fluid flow, or of the behavior of a plasma, are examples of such applications.
Program code for execution on vector processing supercomputers must be vectorized to exploit the performance advantages of vector processing. Vectorization typically breaks up a loop of the form: ##EQU1## into a nested loop of the form: ##EQU2## where VL is the length of the vector registers of the system. This process is known as "strip mining the loop". In strip mining, the number of iterations in the internal loop is defined by the length of a vector register. The number of iterations of the external loop is defined as an integer number of vector lengths. The remaining iterations are performed as a separate loop placed before the nested loop. Vector length arrays of data from the original data arrays are loaded into the vector registers for each iteration of the internal loop. Data from these vector registers can then be processed at the one or more elements per clock period rate of a vector operation.
Compilers exist that will automatically apply strip mining techniques to scalar loops within program code to create vectorized loops. This capability greatly simplifies programming efficient vector processing. The programmer simply enters code of the form: ##EQU3## and it is vectorized.
There are, however, certain types of operations that inhibit the effectiveness of vectorization. One such operation is the automatic addition of alternate loop exits traditionally associated with array bounds checking. If array bounds checking is enabled, even relatively simple program source statements may generate a large number of alternate exit tests within a loop body because the upper and lower bounds must be checked for each array subscript contained in each program source statement. The presence of a large number of alternate exit tests greatly reduces the efficiency of the loop, and may cause a further loss of performance by preventing a compiler from vectorizing the loop. An equivalent loss of efficiency may also occur with scalar computers.
Therefore, there is a need for a method to remove or eliminate alternate exit tests from a loop body without changing the functionality of the loop.