Embodiments of this invention are concerned with the construction and operation of computers handling a large sparse matrix of data values.
Parallel processing, in which a number of processors operate concurrently to carry out portions of an overall computation task can give a great increase in speed compared with a single processing unit carrying out these portions of the task serially, i.e., in sequence. The term “massively parallel computing” was originally used in the context of so-called supercomputers in which a large number of processors operate in parallel and the computer operation was carried out with fine granularity (i.e., with the overall task broken up into small amounts of computation carried out concurrently). The term “massively parallel” is now also applied to a somewhat similar approach carried out using multiple cores on a single chip. (The term “core” indicates a processor able to read and execute instructions). This arrangement with multiple cores on a single chip is characteristic of graphics processing units (GPUs) as used in computer games devices and on graphics cards for desktop computers. It is possible to program a computer equipped with a GPU so that the graphics processor on that card is used to carry out parallel computation. Application Programming Interfaces (APIs) for general purpose computation on GPUs have been published by manufacturers of graphics cards and the use of GPUs for general purpose computing is an area of active interest.
Parallel processing provides an advantage in computing speed and power, but requires additional organization in hardware and software, in particular to distribute portions of the task to different cores and to take account of the fact that the portions of the task carried out asynchronously by the different cores will not finish simultaneously nor even in a predetermined order.
There is a wide range of fields where it is desirable to carry out computer processing of a large sparse matrix of data. A sparse matrix is one where many values are zero.
Notably it may be desired to compute the solution to a problem in linear algebra of the general format:Ax=b where A is a large sparse matrix of values, x is a vector of unknown values and b is a vector of known values.
Because of the size of the matrices, an iterative approach is used to compute the solution to the problem. In order to avoid filling a vast amount of memory with zero values it is known to store a large sparse matrix in “compressed sparse row” (CSR) format in which the information is stored as three arrays, which are a vals array containing the non-zero data values, a cidx array containing the matrix column indices for the non-zero values and an ridx array of integers which are pointers to the position in the vals array (and also in the cidx array) at which each new row begins. The final value of the ridx array gives the total number of elements stored, equal to the length of the vals array. This format greatly reduces the memory requirements.
A number of software programs which provide functions to solve problems of this Ax=b form are available and are referred to as “linear solvers.” The CSR format is the standard input format for these linear solvers. Many of these linear solvers are programs which perform the computing of data in a serial fashion. Linear solvers which operate on parallel processors such as GPUs have been described in the literature, notably by Buatois et al “Concurrent Number Cruncher: An Efficient Sparse Linear Solver on the GPU” Lecture Notes in Computer Science, year: 2007 pg: 358-371 and Wang et al. “Solving Sparse Linear Systems on NVIDIA Tesla GPUs” Lecture Notes in Computer Science year: 2009 pg: 864-873. A library of linear solver programs which operate on parallel processors is available under the trade name SpeedIT from Vratis Ltd, Wroclow, Poland (www.vratis.com).
As mentioned, the CSR format is a standard storage format for data from a large sparse matrix. There is also an equivalent “compressed sparse column” format which has an array of row indices and an array of pointers to the start of new columns.