1. Field of the Invention
The present invention relates generally to an apparatus and method for performing domain decomposition and, more particularly, to an apparatus and method for performing domain decomposition for a data parallel computer.
2. Related Art
A data parallel computer has many processors, each with some associated memory and all acting under the direction of a serial computer called a host. The host and data parallel computer could reside in a single machine.
A data parallel computer has many processing elements (called PEs) each with some associated memory and all acting under the direction of a serial computer called a host computer. On some data parallel computers, the host computer and PEs reside in a single machine. In general, the host computer performs serial tasks and dispatches parallel tasks to the PEs. For example, the host computer can distribute elements of an array to the PEs so that the data parallel computer can process the array in a parallel manner. The speed with which the data parallel computer can process the arrays depends greatly on how the array elements are mapped to the processing elements.
If this conventional mapping technique is to map the array elements as explicitly specified in an application program. Because the programmer (i.e., the operator of the application program) generally knows how the array data will be processed, he or she can usually determine an efficient mapping. Defining a mapping, however, is quite complex. As a result, the first technique adds substantially to the difficulty of writing application programs for, or adapting application programs to, parallel data computers.
A second mapping technique is that used in the Thinking Machines Corporation CM Fortran Release 0.5. According to the second technique, the run time system handles the array element to PE mapping, thereby relieving the programmer of this task. The second technique works as follows. Each array axis is assigned log.sub.2 (axis length) "off-chip bits" until there are no more off-chip bits. Off-chip bits are the bits of a parallel data computer address which identify the various PEs. Then, any axis not assigned off-chip bits is assigned log.sub.2 (axis length) "on-chip bits". On-chip bits are bits of the parallel data computer address which identify a particular memory location on the PE identified by the off-chip bits.
Although the second technique simplifies programming the data parallel computer, applications on which it is invoked exhibit poor performance. Specifically, performance is unbalanced between the array axes, as axes assigned off-chip bits perform substantially better than those assigned on-chip bits.
Therefore, there is a need for an array to PE mapping technique which is implemented in the system software (or hardware) of the data parallel computer and which results in efficient processing of arrays.