1. Field of the Invention
The present invention relates generally to a compiler and, more particularly, to a compiler for a data parallel computer.
2. Related Art
A data parallel computer has an array of many parallel processors, each with some associated memory and all acting under the direction of a serial computer called a host. The parallel data computer supports parallel values having multiple data points called positions. The value of a position is called an element. Each parallel processor stores the element of for one such position in its local memory.
All of the parallel processors (or a subset of the parallel processors) can perform a single operation on all of the positions of a parallel value simultaneously. Such an operation is called a parallel operation.
Each of the parallel processors is assigned an identifier. The assignments are made so that the distance between any two parallel processors is indicated by the difference between their identifiers. The direction from a first parallel processor to a second parallel processor is referred to as upward if the identifier of the first parallel processor is less than that of the second and downward if the identifier of the first parallel processor is greater than that of the second.
An offset from a first parallel processor to a second parallel processor is the identifier of the second parallel processor minus the number of the first parallel processor. Note that the offset has both a distance and a direction component on a data parallel computer whose communications network is implemented as a hypercube. The nearest neighbors of the first parallel processor are those parallel processors whose identifiers differ from that of the first parallel processor by a power of two.
Once a parallel operation has been performed, results must be distributed among the parallel processors. The distribution is carried out by transmitting the data over a data router. The data router is more fully described in the above-referenced U.S. Patent Application entitled "Method And Apparatus For Simulating M-Dimensional Connection Networks In An N-Dimensional Network Where M is Less Than N".
On some data parallel computers, there are two techniques by which data can be transferred on the data router. The first technique is called general communication. General communication is carried out by transmitting data from specified source parallel processors to a specified destination parallel processor.
The second technique is called grid communication. Grid communication is carried out by transmitting data from specified source parallel processors along a specified path. The path can be specified with an offset and an axis. Grid communication is generally substantially faster than general communication. Accordingly, parallel communication instructions should generally be carried out with grid communication where possible.
Distribution of the results of a parallel operation is specified by parallel communication instructions. A compiler for a data parallel computer must determine how data is to be distributed among the parallel processors. That is, it must determine what data should be sent where. Such a compiler must further generate target code for efficiently carrying out the parallel communication instructions.
A first conventional compiler for a data parallel computer analyzes source code having no explicit description of how data is to be distributed among the data processors and automatically generates parallel communication instructions in the target code. The programmer can thus write source code as if for a serial computer. However, the efficiency of the parallel communication instructions generated is limited by the amount of information about the data that the compiler can determine from the source code. Furthermore, the analysis necessary to generate the parallel communication instructions requires substantial overhead. Accordingly, the first conventional compiler generates executable code with mediocre performance characteristics and requires a substantial amount of computation time to generate target code.
A second conventional compiler for a data parallel computer enables the programmer to explicitly specify the distribution of data to the various parallel processors with the use of object-oriented domains. Because the programmer generally knows more about how the data will be used than a compiler would normally be able to determine from the source code, the second conventional compiler generally generates more efficient target code than the first conventional compiler. Furthermore, because the programmer specifies the distribution of data to the parallel processors, the second conventional compiler performs less analysis of the source code and therefore requires less overhead. However, the second conventional compiler forces the programmer to use the object-oriented model of computation. Many programmers are unfamiliar with the object-oriented programming model. Also, many applications are not well suited to the object-oriented programming model.
Therefore, what is needed is a compiler for a data parallel computer which incorporates the benefits (but not the drawbacks) of both the first and second conventional compilers for data parallel computers. Specifically, what is needed is a compiler which generates object code that effectively distributes the data among the parallel processors, efficiently transfers the data over the router network among the parallel processors, does not require an inordinate amount of compilation time, and does not require the use of an unconventional programming model.