The invention relates to the field of computerized mathematics.
Matrix multiplication and/or linear algebra is used in a wide range of computerized applications, from image processing to genetic analysis. For example, computers are often called upon to solve systems of linear equations, many times with many more than two variables. Even more frequently, they are called upon to multiply matrices. For example, matrix multiplication is used in cryptography, random numbers, error correcting codes, and image processing. One example is in cryptanalysis, where chained operations described as matrices must be multiplied together before being analyzed for flaws. Another example is in the design of random-number generators, where exponentiation (i.e. repeated multiplication) of dense matrices is used to determine the period and quality of random number generators. We see the results of matrix mathematics in every computer-generated image that has a reflection, or distortion effects such as light passing through rippling water. For example, graphics cards use matrix mathematics to account for reflection and for refraction.
As a result of its wide usage, matrix multiplication is an integral feature of computer microprocessors, such as CPUs (Central Processing Units), GPUs (Graphic Processing Units), embedded processors, FPGAs (Field-Programmable Gate Arrays), and the like. Matrix multiplication may be part of a system kernel, such as an operating system kernel, a math library kernel, a graphics processing kernel, and/or the like. The matrix multiplication may be performed by a combination of hardware and software components that are coordinated to produce the matrix results, such as in parallel processor operating system kernels that use multiple hardware processors to perform matrix multiplications.
Many techniques have been developed to improve the computational efficiency, speed, memory use, communications use, etc., of computerized matrix multiplication. For example, Strassen's well-known matrix multiplication algorithm is a sub-cubic matrix multiplication algorithm, with a complexity of 0(nlog 27). See Volker Strassen, “Gaussian elimination is not optimal”, in Numerische mathematik 13, 4 (1969), 354-356. Winograd's matrix multiplication algorithm may reduce the leading coefficient from 7 to 6 by decreasing the number of additions and subtractions from 18 to 15. See Shmuel Winograd, “On multiplication of 2×2 matrices”, in Linear algebra and its applications 4, 4 (1971), 381-388.
In practice, Strassen-Winograd's algorithm for matrix multiplication may perform better than some asymptotically faster algorithms due to these smaller hidden constants. The leading coefficient of Strassen-Winograd's algorithm may be optimal, due to a lower bound on the number of additions for matrix multiplication algorithms with 2×2 base case, obtained by Robert L. Probert, “On the additive complexity of matrix multiplication”, in SIAM J. Comput. 5, 2 (1976), 187-203. As used herein, the term “additions” may be in some circumstances intrechanged with the word “subtraction”, as appropriate to the context.
Strassen-like algorithms are a class of divide-and-conquer algorithms which may utilize a base n0, m0, k0; t-algorithm: multiplying an n0×m0 matrix by an m0×k0 matrix using t scalar multiplications, where n0, m0, k0 and t are positive integers. When multiplying an n×m matrix by an m×k matrix, an algorithm may split the matrices into blocks (such as each of size
            n              n        0              ×          m              m        0              ⁢                  ⁢    and    ⁢                  ⁢          m              m        0              ×          k              k        0              ,respectively), and may proceed block-wise, according to the base algorithm. Additions and multiplication by a scalar in the base algorithm may be interpreted as block-wise additions. Multiplications in the base algorithm may be interpreted as block-wise multiplication via recursion. As used herein, a Strassen-like algorithm may be referred to by its base case. Hence, an n, m, k; t-algorithm may refer to either the algorithm's base case or the corresponding block recursive algorithm, as obvious from context.
Recursive fast matrix multiplication algorithms with reasonable base case size for both square and rectangular matrices have been developed. At least some may have manageable hidden constants, and some asymptotically faster than Strassen's algorithm (e.g., Kaporin's implementation of Laderman algorithm; see Igor Kaporin, “The aggregation and cancellation techniques as a practical tool for faster matrix multiplication” in Theoretical Computer Science 315, 2-3, 469-510).
Recently, Smirnov presented several fast matrix multiplication algorithms derived by computer aided optimization tools, including an 6,3,3; 40-algorithm with asymptotic complexity of 0(nlog 54403), i.e. faster than Strassen's algorithm. See A V Smirnov, “The bilinear complexity and practical algorithms for matrix multiplication”, in Computational Mathematics and Mathematical Physics 53, 12 (2013), 1781-1795. Ballard and Benson later presented several additional fast Strassen-like algorithms, found using computer aided optimization tools as well. They implemented several Strassen-like algorithms, including Smirnov's 6,3,3; 40-algorithm, on shared-memory architecture in order to demonstrate that Strassen and Strassen-like algorithms can outperform classical matrix multiplication in practice (such as Intel's Math Kernel Library), on modestly sized problems (at least up to n=13000), in a shared-memory environment. Their experiments also showed Strassen's algorithm outperforming Smirnov's algorithm in some of the cases. See Austin R. Benson and Grey Ballard, “A framework for practical parallel fast matrix multiplication” in ACM SIGPLAN Notices 50, 8 (2015), 42-53.
Bodrato introduced the intermediate representation method, for repeated squaring and for chain matrix multiplication computations. See Marco Bodrato, “A Strassen-like matrix multiplication suited for squaring and higher power computation”, in Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation, ACM, 273-280. This enables decreasing the number of additions between consecutive multiplications. Thus, he obtained an algorithm with a 2×2 base case, which uses 7 multiplications, and has a leading coefficient of 5 for chain multiplication and for repeated squaring, for every multiplication outside the first one. Bodrato also presented an invertible linear function which recursively transforms a 2k×2k matrix to and from the intermediate transformation. While this is not the first time that linear transformations are applied to matrix multiplication, the main focus of previous research on the subject was on improving asymptotic performance rather than reducing the number of additions.
The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the figures.