Manipulation of systems of arrays of numbers has resulted in the development of various matrix operations. One such matrix operation is called the transpose which has a representation as M.sup.T, where M defines the matrix and T defines the transpose operation. Matrix transpose is a permutation frequently performed in linear algebra and particularly useful in finding the solution set for complex systems of differential equations. A particular example where matrix transposition may be advantageous to finding the correct solution set is in Poisson's problem by the Fourier Analysis Cyclic Reduction (FACR) method.
A matrix transpose operation involves defining a diagonal from the upper left corner of the matrix to the lower right corner. Then, swapping all the elements on the lower left half of the matrix diagonal with the corresponding elements on the upper right half of the diagonal. All the elements included in the diagonal get swapped with themselves or, in other words, remain the same. The end result of the matrix transpose is to obtain a matrix that has been essentially folded along its diagonal axis with the elements having been transposed across the diagonal line.
The matrix transpose has implications particularly important to finding the solution set to complex systems of equations. One drawback to the matrix transpose technique is that performing the transpose operation on large matrices is time consuming and prone to error and miscalculation. For this reason, the computer environment is particularly well suited to perform matrix transpose operations because the computer is fast and accurate.
Multiprocessor computer systems having parallel architectures employ a plurality of processors that operate in parallel to simultaneously perform a single set of operations. On parallel multiprocessor systems, a problem is broken up into several smaller sub-problems then distributed to the individual processors which operate on separate segments of the sub-problems allocated to them. When all the processors have completed their simultaneous operations, the individual parts of the solved sub-problems are then re-assembled to produce the final solution. With such a parallel architecture, complex problems can be manipulated in such a fashion as to produce a solution in a much quicker time than it would take for a single processor to perform the same problem alone.
One such parallel multiprocessor computer architecture is called the mesh. Another is the hypercube. A parallel computer's mesh architecture is configured as an array of processors or nodes much like a mesh or grid.
After a complex problem has been broken into smaller sub-problems, the pieces must be distributed among the processors for parallel computation. Routing is the way a processor chooses a path in the mesh when it communicates with other processors on the mesh by sending them messages and deals with the way messages are carried through the network.
Various routings are known in the art and associated with mesh architecture such as store-and-forward, circuit-switched, wormhole, and virtual cut-through routing. Store-and-forward is often considered the least complex of the routing methods. While there are differences between circuit-switched, wormhole, and virtual cut-through routings, they all can be modeled by the same complexity estimates when there is no congestion on the mesh so that these routings are considered to be roughly equally more advanced than the store-and-forward method and can be considered to be circuit-switched-like.
For most current parallel architectures with circuit-switched-like routing, each node in a single step can send out at most one message originated from it, receive at most one message destined for it, and allow any number of by-passing messages through it so long as there is no congestion. Such communication capability is often referred to as the one-port communication model. In an all-port communication model, each processor can communicate with all other processors simultaneously in a single step.
For choosing the particular path in routing on a two-dimensional, there is typically the XY-routing or YX-routing. In the XY-routing on a two-dimensional mesh, a message travels horizontally then vertically towards its destination node. While in the YX-routing, a message travels vertically then horizontally towards its destination node.
The following complexities render the matrix transpose on a mesh a complicated problem: the limited communication bandwidth for the mesh topology (typically used XY- or YX-routing); and the availability of more advanced circuit-switched-like routing.
The following references are considered to be pertinent to understanding the background of the invention.
Eklundh, "A Fast Computer Method for Matrix Transposing", IEEE Trans. Comput., 21(7):801-803, 1972, wherein the recursive exchange algorithm for the matrix transpose problem is discussed.
Stone, "Parallel Processing with the Perfect Shuffle", IEEE Trans. Comput., 20:153-161, 1971, wherein matrix transpose on shuffle-exchange networks is discussed.
Johnsson, "Communication Efficient Basic Linear Algebra Computations on Hypercube Architectures", J. Parallel Distributed Computing, 4(2):133-172, April 1987, and McBryan and Van de Velde, "Hypercube Algorithms and Implementations", SIAM J. Sci. Stat. Comput., 8(2):s227-s287, March 1987, wherein the recursive exchange algorithm is applied to hypercube architectures. Johnsson also gives a transpose algorithm for a torus, i.e., a mesh with wraparound edges, computer with multiple matrix elements per processor. The communication model assumed was all-port and the routing was store-and-forward.
Johnsson and Ho, "Matrix Transposition on Boolean N-Cube Configured Ensemble Architectures", SIAM J. Matrix Anal. Appl., 9(3):419-454, July 1988, and independently, Stout and Wagar, "Passing Messages in Link-Bound Hypercubes", Hypercube Multiprocessors, Society for Industrial and Applied Mathematics, 1987, discussed matrix transpose algorithms on hypercube architectures for the all-port communication model.
Ho and Raghunath, "Efficient Communication Primitives on Circuit-Switched Hypercubes", IEEE Sixth Distributed Memory Computing Conference, pp. 330-397, April 1991, wherein a transpose algorithm for circuit-switched routing on hypercubes with the one-port communication model was discussed.
Nassimi and Sahni, "An Optimal Routing Algorithm for Mesh-Connected Parallel Computers", Journal of ACM, Vol. 27, No. 1, pp. 6-29, January 1980, wherein a transpose algorithm for a mesh computer with store-and-forward routing and one matrix element per processor was given.
Although the recursive exchange algorithm to perform the matrix transpose on the hypercube architecture has been applied to mesh configurations having the store-and-forward routing (such as the prior art by Nassimi and Sahni and that by Johnsson), what is absent in the art has been to provide a matrix transpose algorithm for the 2-dimensional mesh multiprocessor configuration with circuit-switched-like routing.