A key aspect of parallel computing is the ability to exploit parallelism in one or more loops in computer programs. Loops that do not have cross-iteration dependencies, or where such dependencies are linear with respect to the loop index variables, one can use various existing techniques to achieve parallel processing. A suitable reference for such techniques is Wolfe, M., High Performance Compilers for Parallel Computing, Addison-Wesley, 1996, Chapters 1 and 7. Such techniques perform a static analysis of the loop at compile-time. The compiler suitably groups and schedules loop iterations in parallel batches without violating the original semantics of the loop.
There are, however, many cases in which static analysis of the loop is not possible. Compilers, in such cases, cannot attempt any parallelization of the loop before run-time.
As an example, consider the loop of Table 1 below, for which parallelization cannot be performed.
TABLE 1do i = 1, nx[u(i)] = . . . .. . . .. . . .. . . .. . . .y[i] = x[r(i)] . . .. . . .. . . .. . . .. . . .enddo
Specifically, until the indirect loop index variables u(i) and r(i) are known, loop parallelization cannot be attempted for the loop of Table 1.
For a review on run-time parallelization techniques, refer to Rauchwerger, L., Run-Time Parallelization: It's Time Has Come, Journal of Parallel Computing, Special Issue on Language and Compilers, Vol. 24, Nos. 3–4, 1998, pp. 527–556. A preprint of this reference is available via the World Wide Web at the address www.cs.tamu.edu/faculty/rwerger/pubs.
Further difficulties, not discussed by Wolfe or Rauchwerger, arise when the loop body contains one or more conditional statements whose evaluation is possible only during runtime. As an example, consider the loop of Table 2 below, for which parallelization cannot be attempted by a compiler.
TABLE 2do i = 1, nx[u(i)] = . . . .. . . .. . . .. . . .. . . .if (cond) then y[i] = x[r(i)] . . .else y[i] = x[s(i)] . . .. . . .. . . .. . . .. . . .enddo
The value of r(i) and s(i) in the loop of Table 2 above, as well as the indirect loop index variables u(i) must be known before loop parallelization can be attempted. Further, in each iteration, the value of cond must be known to decide whether r(i) or s(i) should be included in a particular iteration.
Further advances in loop parallelisation are clearly needed in view of these and other observations.