1. Field of the Invention
The present invention relates generally to a technique for improving performance for linear algebra routines. More specifically, matrix data is stored into and retrieved from memory in a block size that is related to the size of the L1 cache, such as 2NB-by-NB/2, where NB2 is a fraction of the size of the L1 cache.
2. Description of the Related Art
Scientific computing relies heavily on linear algebra. In fact, the whole field of engineering and scientific computing takes advantage of linear algebra for computations. Linear algebra routines are also used in games and graphics rendering.
Typically, these linear algebra routines reside in a math library of a computer system that utilizes one or more linear algebra routines as a part of its processing. Linear algebra is also heavily used in analytic methods that include applications such as supply chain management, as well as numeric data mining and economic methods and models.
A number of methods have been used to improve performance from new or existing computer architectures for linear algebra routines. However, because linear algebra permeates so many calculations and applications, a need continues to exist to optimize performance of matrix processing. Prior to the present invention, no optimal method and structure as described herein has been proposed.
More specific to the technique of the present invention and as recognized by the present inventors, performance loss occurs for linear algebra processing when the size of the L1 cache is not considered.