1. Field of the Invention
The present invention relates generally to a technique for improving performance for linear algebra routines, with special significance to optimizing the matrix multiplication process as exemplarily implemented in the existing LAPACK (Linear Algebra PACKage) standard. More specifically, a streaming technique allows a steady and timely flow of matrix data from different cache levels, in which submatrices of A, B, and C “play the role” of scalar, vector, and matrix in a general linear algebra subroutine kernel and are selectively stored in different cache levels.
2. Description of the Related Art
Scientific computing relies heavily on linear algebra. In fact, the whole field of engineering and scientific computing takes advantage of linear algebra for computations. Linear algebra routines are also used in games and graphics rendering.
Typically, these linear algebra routines reside in a math library of a computer system that utilizes one or more linear algebra routines as a part of its processing. Linear algebra is also heavily used in analytic methods that include applications such as supply chain management, as well as numeric data mining and economic methods and models.
A number of methods have been used to improve performance from new or existing computer architectures for linear algebra routines.
However, because linear algebra permeates so many calculations and applications, a need continues to exist to optimize performance of matrix processing.