1. Field of the Invention
The present invention generally relates to a method developed to overcome a problem of processing imbalance noted in Assignee's newly-developed Blue Gene/L™ (BG/L) multiprocessor computer. More specifically, introduction of a skew term in a distribution of contiguous blocks of array elements permits spreading the workload over a larger number of processors to improve performance.
2. Description of the Related Art
A problem addressed by the present invention concerns the design of the Assignee's new Blue Gene/L™ machine, currently considered the fastest computer in the world. The interconnection structure of the Blue Gene/L™ machine is that of a three-dimensional torus.
Standard block-cyclic distribution of two-dimensional array data on this machine, as is normally used in the LINPACK benchmark (a collection of C routines that are used to solve a set of dense linear equations), causes an imbalance, as follows: If an array (block) row is distributed across a contiguous plane or subplane of the physical machine (a two-dimensional slice of the machine), then an array (block) column is distributed across a line in the physical machine (e.g., a one-dimensional slice of the machine), as can be seen in FIGS. 5A and 5B, to be discussed after an understanding of the block-cyclic distribution of two-dimensional array data is presented in the following discussion.
This distribution results in critical portions of the computation (namely, the panel factorization step) being parallelized across a much smaller part of the machine, and in certain broadcast operations, having to be performed alone a line of the machine architecture, such as a row or column of processors, rather than planes.
Altering the data mapping to allow rows and columns to occupy greater portions of the physical machine can improve performance by spreading the critical computations over a larger number of processors and by allowing the utilization of more communication “pipes” (e.g., physical wires) between units performing the processing.
Although on-the-fly re-mapping/re-distribution of the data could provide one possible solution to this problem, this solution has the disadvantages of requiring time and space to re-map data, and the resulting code is more complex. Replicating the array is another possible solution, but the cost of this solution is the multiple copies of the data, the memory consumed, and the complexity of keeping the copies consistent.
Thus, a need exists to overcome this problem on three-dimensional machines, such as the BG/L™, as identified by the present inventors, in a manner that avoids these disadvantages of time and space requirements and code complexity.