1. Field of the Invention
The present invention relates generally to an improved data processing system and in particular to the compilation of computer usable program code. Still more particularly, the present invention relates to a method for the pipelined parallelization of multi-dimensional loops with multiple data dependencies.
2. Description of the Related Art
Many modern computers are capable of performing parallel processing. Parallel processing is the simultaneous use of more than one processor to execute a program. Parallel processing differs from multitasking in that, in multitasking, a single processor executes more than one program simultaneously. Parallel processing can be achieved by using multiple processors in a single computer, or by using multiple computers connected in a network. This latter type of parallel processing uses distributed software to create the effect of multiple parallel processors in a single computer.
In either case, the goal of parallel processing is to make programs run faster by having multiple processors executing the program at the same time. In practice, writing or dividing a program in such a way that separate processors can execute different portions of the program is difficult. The difficulty arises because the various processors can interfere with each other with respect to execution of the program.
Many computer programs contain loops which are taken into account when determining how to perform parallel processing with respect to the computer programs. A loop is a program or subroutine that executes multiple times, often iteratively, until some desired result occurs or some time passes. Loops are a large potential source of parallelism in computer programs. Ideally, multiple processors should perform different iterations of a loop simultaneously in order to increase the speed at which the loop processes.
For example, a particular loop contains twenty iterations. If twenty processors simultaneously perform one different iteration of the loop, the entire loop can be processed much more quickly than if a single processor alone had performed all twenty loops.
One method of allowing a computer program with loops to take advantage of parallel processing is to compile the program to exploit available parallel processing power. A compiler is a computer program that translates a series of program instructions written in a source computer language into program instructions written in a target computer language, or otherwise modifies the code of the source code. In an example, a compiler can change the code of the original source program to better take advantage of available parallel processing power.
However, commercial compilers are lacking with regard to exploitation of parallelism available in loops. Most compilers are limited to automatically parallelizing DOALL loops. A DOALL loop is a loop that has no data dependencies. In contrast, a DOACROSS loop is a loop that has at least one cross iteration data dependency. Available compilers serialize DOACROSS loops because of a major problem associated with parallelizing DOACROSS loops.
The main problem with parallelizing DOACROSS loops is the synchronization operations involved. Synchronization operations are generally very expensive. Using synchronization excessively or carelessly can result in severe performance degradation. This performance degradation defeats the purpose of parallel processing and of compiling the program; thus, available compilers simply do not parallelize DOACROSS loops. As a result, a higher degree of program performance cannot be achieved by using available compilers with respect to programs having DOACROSS loops.