Some data modeling applications use matrices that describe transformations of physical parameters. For example, a cosmological model of a galaxy might use a matrix to describe the motion of stars in space, and a finite element model of a material may use a matrix to model stresses in a material at a number of different locations. These matrices transform initial property vectors of the model into final property vectors by standard matrix multiplication.
For any transformation by matrix multiplication, there may be certain vectors for which the transformation merely acts to lengthen or shorten the vector; these vectors are called “eigenvectors” of the transformation. Eigenvectors provide “preferred directions”; vectors parallel to eigenvectors are not rotated by the transformation. The corresponding scaling factor of the lengthening or shortening for a given direction is called the “eigenvalue” for the eigenvector. Different eigenvectors may have different corresponding eigenvalues, and eigenvectors with an eigenvalue of 1 are not lengthened or shortened by the transformation; for these vectors, the transformation preserves length. Eigenvectors and eigenvalues provide a useful mathematical tool to analyze matrix transformations. Therefore, it is desirable to be able to compute eigenvectors and eigenvalues (collectively, “eigenpairs”) for any given matrix.
Several techniques are known to calculate eigenpairs of a matrix. One family of “eigensolver” techniques first reduces the matrix to a tridiagonal form; that is, a form in which the main diagonal of the matrix, and the diagonals just above and below it, may contain non-zero numbers, but all other entries are zero. Such an eigensolver computes the eigenpairs of the tridiagonal matrix, then convert the computed eigenvectors back to the original reference system.
In order to achieve the best scalability on many processors, the ACM TOMS 807 algorithm, or successive band reduction (“SBR”), is often employed for the tridiagonal reduction phase. In SBR, the initial, densely-populated matrix is reduced to a multiple band intermediate form having many non-zero diagonals in a first stage, and later reduced from the multiple band form to the tridiagonal (three band) form in the second stage. Accordingly, after calculating the eigenpairs in a third stage, the eigenvector back-transformation also requires two stages, i.e. from the tridiagonal to the multiple band reference system in stage four, then to the original dense reference system in stage five. The multistage SBR approach allows highly scalable BLAS-3 computing kernels to be used, but the two stage eigenvector back-transformation introduces additional floating point operations that influence scalability.
The Parallel Linear Algebra Software for Multicore Architectures, or “PLASMA”, is publically available free software from the University of Tennessee, Knoxville, and is the state of art mathematical library for performing conversion of a dense symmetric array to and from tridiagonal form on shared memory systems with multi-core processors, such as the system shown in FIGS. 1-3 and described below. At the heart of the PLASMA implementation is a tiled storage of the data arrays and a DAG (directed acyclic graph) scheduling of the computational subtasks. PLASMA is an improvement over the older LAPACK and ScaLAPACK libraries in terms of memory usage pressure, process synchronization requirements, task granularity, and load balance, and often results in much better performance and scalability for sufficiently large problems. In particular, LAPACK and ScaLAPACK do not employ SBR calculations. Also, LAPACK is not designed for scalability, and the ScaLAPACK library communicates using message passing, which is more difficult to program and debug.