1. Field of the Invention
The present invention relates generally to techniques for improving performance for linear algebra routines, with special significance to optimizing the matrix multiplication process, as exemplarily implemented as improvements to the existing LAPACK (Linear Algebra PACKage) standard. More specifically, preloading techniques allow a steady and timely flow of matrix data into floating point registers of floating point units (FPUs).
2. Description of the Related Art
Scientific computing relies heavily on linear algebra. In fact, the whole field of engineering and scientific computing takes advantage of linear algebra for computations. Linear algebra routines are also used in games and graphics rendering. Typically, these linear algebra routines reside in a math library of a computer system that utilizes one or more linear algebra routines as a part of its processing. Linear algebra is also heavily used in analytic methods that include applications such as supply chain management, as well as numeric data mining and economic methods and models.
A number of methods have been used to improve performance from new or existing computer architectures for linear algebra routines. However, because linear algebra permeates so many calculations and applications, a need continues to exist to optimize performance of matrix processing.
More specific to the technique of the present invention, it has been recognized by the present inventors that performance loss occurs for linear algebra processing when the data for processing has not been loaded into cache or working registers by the time the data is required for processing by the linear algebra processing subroutine.