1. Field of the Invention
The present invention generally relates to improving efficiency of in-place data transformations such as a matrix transposition. More specifically, part of the data to be transformed is pre-arranged, if necessary, to first be contiguously arranged in memory as contiguous blocks of contiguous data, which data is then available to be retrieved from memory into cache in units of the blocks of contiguous data, for application of a transformation on the data such as a matrix transposition, and then replaced in the same memory space. The part of the data which cannot be transformed because of space limitations is saved in a buffer and later placed out-of- place back into holes of the transformed data. This continuation-in-part application extends the concept of in-place transformation, such as a matrix transposition, to a small portion of matrix data treated as a unit.
2. Description of the Related Art
Standard format for matrix data in Fortran is column major order and for C is row major order. Thus, in matrix multiplication, where there are three matrices, at least one matrix will have data stored in memory in a manner in which an entire data line must be retrieved in order to get only one data element of interest.
The present inventors have developed a number of methods to improve efficiency in matrix processing to overcome this inherent problem with standard format consisting of either row major or column major, including several new data structures that allow matrix data to be stored in memory in non standard formats, so that the data will be retrieved as needed for processing as contiguous data appropriately arranged in a line of memory, including variations based upon computer architectural features and deficiencies. These new data structures, therefore, greatly improve efficiency in Dense Linear Algebra Factorization Algorithm (DLAFA) processing.
For example, the first of the above-identified applications provides an efficient method to perform an in-place transformation of matrix data as might be used, for example, for an in-place matrix transposition. The second exemplary embodiment extended this concept of in-place matrix transposition to include packed format data (e.g., data of triangular or symmetrical matrices), using column swaths.
The second above-identified copending application provides a method of using square submatrices that can then be transposed in-place, and the third above-identified copending application provides a method of converting triangular/symmetrical matrix data into a rectangular data structure.
The present invention provides additional aspects to concepts of the first of the above-identified copending applications, including some generalizations.
As an example of the type of data transformations that the present invention can make more efficient, there are in-place algorithms for matrix transposition that works on individual matrix elements. Because the individual matrix elements must be referenced in an essentially random order for large M and N these codes run very slowly. U.S. Pat. No. 7,031,994 to Lao, et. al., partially addresses this problem.
However, as explained in more detail below, the results of Lao have quite limited scope. In many instances where the technique works, a fair amount of extra storage is used. They assume the underlying permutation is known but give no indication on how they find this structure or the amount of extra storage required.
Thus, as demonstrated by Lao, a need continues to provide methods that improve efficiency of matrix processing in computers, particularly in view of shortcomings or deficiencies of newer computer architectures.