The invention relates to parallel compiler technology, and specifically relates to iteration scheduling in which iterations in a nested loop are assigned and scheduled to execute on processor elements in a processor array.
Parallel compilers are used to transform a computer program into parallel code that runs on multi-processor systems. Traditionally, software developers design the compiler to optimize code for a fixed type of hardware. A principal objective of the compiler is to organize the computations in the program so that sets of computational tasks in the program may be executed concurrently across multiple processors in the specified hardware architecture.
Parallel compiler technology extends across a broad range of parallel computer architectures. For example, the multi-processor architecture may employ shared memory in which each processor element shares the same memory space, or distributed memory in which each processor has a local memory.
One area of compiler and computer architecture research focuses on optimizing the processing of computer programs with loop nests. Many computational tasks in software applications are expressed in the form of a multi-nested loop with two or more loops around a block of code called the loop body. The loop body contains a series of program statements, typically including operations on arrays whose elements are indexed by functions of loop indices. Such loop nests are often written in a high level programming language code in which the iterations are ordered sequentially. The processing of the loop nest may be optimized by converting the loop nest code to parallel processes that can be executed concurrently.
One way to optimize loop nest code is to transform the code into a parallel form for execution on an array of processor elements. The objective of this process is to assign iterations in the loop nest to processor elements and schedule a start time for each iteration. The process of assigning iterations to processors and scheduling iterations is a challenging task. Preferably, each iteration in the loop nest should be assigned a processor and a start time so that each processor is kept busy without being overloaded.
The invention provides an efficient method for programmatically scheduling iterations in a sequential loop nest for execution on a parallel array of processor elements. This scheduling method optimizes the use of each processor element without overloading it.
One aspect of the invention is a programmatic method for determining an iteration schedule for a parallel processor array such that the schedule satisfies an initiation interval constraint. The initiation interval is a measure of throughput that represents the shortest time interval between the initiation of successive iterations of the nested loop on a processor element. The scheduling method accepts a mapping of iterations of the nested loop to processor elements in a processor array. Based on this mapping and a specified initiation interval, the method programmatically determines a definition of iteration schedules. Each schedule satisfies the constraint that no more than one iteration is started on a processor element for each initiation interval. This method is implemented so as to keep each processor element as busy as possible. The schedules start a new iteration on each processor element nearly every initiation interval.
Another aspect of the invention is a programmatic method for determining a set of iteration schedules that satisfy a specified resource constraint and data flow dependencies among operations in a nested loop. The implementation of this method uses linear programming to guide the selection of iteration schedules such that the selected schedules are likely to satisfy data flow dependencies.
The invention provides an efficient and direct method for generating a set of schedules. This method enables the iteration scheduler to select one or more schedules that satisfy desired constraints. Since the scheduler quickly and efficiently provides iteration schedules, it can evaluate each schedule and select one or more that yield a parallel processor array with optimized performance or cost.
Further advantages and features of the invention will become apparent in the following detailed description and accompanying drawings.