1. Field of the Invention
The present invention relates generally to a technique for improving performance for linear algebra routines, with special significance to optimizing the matrix multiplication process as exemplarily implemented in the existing LAPACK (Linear Algebra PACKage) standard. More specifically, a streaming technique allows submatrices of A, B, and C to “play the role” of scalar, vector, and matrix in a general linear algebra subroutine kernel that is selectable from six possible kernels, as based on matrix size to be best stored in a cache (e.g., the L1 cache).
2. Description of the Related Art
Scientific computing relies heavily on linear algebra. In fact, the whole field of engineering and scientific computing takes advantage of linear algebra for computations. Linear algebra routines are also used in games and graphics rendering.
Typically, these linear algebra routines reside in a math library of a computer system that utilizes one or more linear algebra routines as a part of its processing. Linear algebra is also heavily used in analytic methods that include applications such as supply chain management, as well as numeric data mining and economic methods and models.
A number of methods have been used to improve performance from new or existing computer architectures for linear algebra routines.
However, because linear algebra permeates so many calculations and applications, a need continues to exist to optimize performance of matrix processing. Moreover, the conventional wisdom is that only a single kernel type is available for matrix multiplication. An improvement in operation would be possible if five more kernel types were available so that one of six kernel types could be selected as most suited. However, prior to the present invention, such a technique has been unknown and unrecognized.