Matrix operations on large matrices are becoming more common. For some technical problems, solutions may involve matrices as large as 1000-by-1000. One common operation is matrix transposition. For example, it may be necessary to transpose a large matrix to perform a Fast Fourier Transform operation, an interleaving operation, or other linear algebraic operations.
Large transposition operations can be broken down into a series of smaller transposition operations. For example, to transpose an 8-by-8 matrix, one can break down the matrix into four 2-by-2 matrices. Each of the 2-by-2 matrices can be transposed individually in a series of “inner transposition” operations, after which the larger matrix can be treated as a 2-by-2 matrix, each of whose elements is one of the smaller 2-by-2 matrices. Transposing the positions of the smaller matrices in an “outer transposition” operation, after each of the smaller matrices has been transposed individually, results in a transpose of the larger 8-by-8 matrix.
Such a cascaded transposition technique can be used to transpose any size matrix. However, when the technique is implemented in hardware, memory speed limitations may come into play. For example, some types of memory, such as DDR SDRAM (Double Data Rate Synchronous Dynamic RAM) may be read much faster in one direction (vertically or horizontally) than in the other direction. Thus, for large matrices, performing the transposition within an acceptable duration may require fast memories that are expensive in terms of both price and power consumption. For example, if the remainder of the system uses double-data-rate (DDR) memory, it may be necessary to use quad-data-rate (QDR) memories for the transposition operation.