As is known, when a data-parallel language, such as Fortran 90, is compiled for a distributed-memory, parallel processor computer, aggregate data objects, such as arrays, are distributed across the processor memories. This mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an "alignment" phase that maps all the objects to an abstract template in a Cartesian index space, and then a "distribution" phase that maps the template to the processors. The alignment phase positions all array objects in the program with respect to each other so as to reduce realignment communication cost. Then, the distribution phase distributes the template aligned objects to the processors. Thus, it will be understood that this two-phase mapping separates the language issues from the machine issues, with the result that the alignment of the data objects is machine independent. A mapping of this type is used in Fortran D, High Performance Fortran, and CM-Fortran.
An important goal of an optimized compilation is to produce data and work mappings that reduce "completion time" (sometimes also referred to as "runtime") Completion time has two components: computation and communication. Communication, in turn, breaks down into two distinct classes: "intrinsic" and "residual" communication. Intrinsic communication arises from computational operations, such as reductions, that require data motion as an integral part of the operation. Residual communication, on the other hand, arises from nonlocal data references required in a computation whose operands are not mapped to the same processors.
This invention focuses on the alignment issue. Thus, to simplify this disclosure, it will be assumed that that data objects are mapped identically to the processors of a computer of the foregoing type if and only if they are aligned. The term "realignment" refers to residual communication due to misalignment, so it is to be understood that an optimization goal is to provide array alignments that minimize realignment cost. For example, communication for transpose, spread, and vector-valued subscript operations can in some cases be avoided, or at least significantly reduced, by suitable alignment choices. This invention causes the communication for such operations to be residual rather than intrinsic, so those operations are subject to the alignment optimization process that is provided by this invention.
Several other workers have considered static alignment. Indeed, the present inventors have published on that subject. This invention extends that work to handle mobile alignment. Knobe, Lukas, and Dally have addressed the issue of dynamic alignment, but their notion of dynamic alignment is limited to quantities that are known only at runtime. As will be seen, this invention extends their work to mobile alignment in the context of loops, where the alignment of an object is an affine function of the loop induction variables (LIVs).