1. Field of the Invention
The invention relates to computer software. More specifically, the field of the invention is that of computer software for efficiently performing computations with sparse matrices.
2. Description of the Related Art
Sparse matrices play an important role in the numerical solution of many scientific problems. For example, many problems in physical modeling, such as simulating the aerodynamics of aircraft wings, involve the solution of partial differential equations. The finite element method for this problem uses sparse matrices to represent the interactions between parts of the problem domain. In addition, sparse graphs used to model relationships between entities, such as modeling the links between pages on the World Wide Web, are often represented as sparse adjacency matrices (aka matrix patterns).
Algorithms to solve these problems often use matrix-vector multiplication as a principal operation; when sparse matrices are used, the operation is usually multiplying a sparse matrix by a dense vector. This disclosure describes a technique (matrix pattern compression) and two implementations of it to improve the performance of matrix-vector multiplication using large matrices on standard microprocessors.
As has been shown by Gropp and others, the main performance bottleneck for matrix-vector multiplication for large matrices on modern architectures is memory bandwidth (See W. Gropp, D. Kaushik, D. Keyes, and B. Smith. “Toward realistic performance bounds for implicit CFD codes” In A. Ecer et al., editors, Proceedings of Parallel CFD'99. Elsevier, 1999). In a matrix-vector multiplication, each element of the matrix is only used once. If the matrix is larger than the system's caches, the entire matrix must be loaded from main memory for each multiplication, and so the bandwidth of main memory becomes the main performance limitation. In modern microprocessor-based systems, the rate at which floating point operations can be conducted exceeds the memory bandwidth by a factor of ten or more. Thus, a method that could reduce the required memory traffic for matrix-vector multiplication could improve performance overall, even if some additional computation may be required.
The main data structure used currently for sparse matrix index data is compressed sparse row (CSR). The data structure used for CSR is shown in FIG. 1. An array of row positions contains the index of the beginning of each row within arrays of column numbers and data. The row of each element in the matrix never appears explicitly in the data structure. As elements in the same row are often located in nearby columns, the CSR representation contains a lot of redundant information. In particular, the column numbers within each row are sorted, and are often close to each other. Thus, using four bytes to represent each column number is highly inefficient, as the first three of those bytes are often the same between several elements in a row. Similarly, the same or similar column numbers are often used in nearby rows of a matrix, leading to more redundant information. This disclosure explains two compression schemes which remove some of this redundant index information, while still allowing a fast multiplication algorithm.
Although previous work, such as that in the work of Blandford and Blelloch (See D. K. Blandford, G. E. Blelloch, and I. A. Kash “Compact representations of separable graphs” In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2003; and D. K. Blandford, G. E. Blelloch, and I. A. Kash “An experimental analysis of a compact graph representation”, 2004), provides other forms of compression that are applicable to sparse matrix patterns, those compression schemes do not provide fast matrix-vector multiplication on modern microprocessors. In particular, the compressed formats in this disclosure provide multiplication operations whose speedups are generally proportional to the total compression ratio. The PageRank algorithm (See L. Page, S. Brin, R. Motwani, and T. Winograd “The PageRank citation ranking: Bringing order to the Web” Technical report, Stanford Digital Library Technologies Project, November 1998) is based on a power-method computation of the principal eigenvalues of a sparse matrix and includes multiplication with an adjacency matrix as its fundamental operation. Implementation of Page Rank with Blandford et al's compressed graphs was beneficial on the Intel Pentium III but had a loss in performance relative to using compressed sparse row format on the Intel Pentium 4, as reported in the second one of the above referenced Blandford and Blelloch papers. Their work also does not consider the effect of compression time on overall application performance.