1. Field of the Invention
The present invention relates generally to techniques for improving performance for linear algebra routines. More specifically, a register block data format provides a method to mask a hardware-level instruction shortcoming such as a lack of a desired conventional hardware/assembly instruction to bring matrix data into Floating Point Registers in a desired matrix transpose format.
2. Description of the Related Art
Scientific computing relies heavily on linear algebra. In fact, the whole field of engineering and scientific computing takes advantage of linear algebra for computations. Linear algebra routines are also used in games and graphics rendering. Typically, these linear algebra routines reside in a math library of a computer system that utilizes one or more linear algebra routines as a part of its processing. Linear algebra is also heavily used in analytic methods that include applications such as a supply chain management.
A number of methods have been used to improve performance from new or existing computer architectures for linear algebra routines. However, because linear algebra permeates so many calculations and applications, a need continues to exist to optimize performance of matrix processing.
More specific to the technique of the present invention and as recognized by the present inventors, performance loss can occur for linear algebra processing in new computer architectures in which one or more conventional hardware or assembly instructions are lacking or deliberately excluded in order to reduce chip complexity and cost.