1. Field of the Invention
The present invention relates to a solution for simultaneous linear equations which have a sparse matrix, namely a band matrix which is a typical example of a matrix having few factors which are not 0, as a coefficient matrix. In particular, the present invention relates a solution program recording media for simultaneous linear equations for solving such simultaneous linear equations with a common memory-type scalar parallel computer.
2. Description of the Related Art
When solving simultaneous linear equations with a computer, a method for solving equations based on Gaussian elimination is used, in which the simultaneous linear equations are expressed by a matrix, the equations are transformed to a format which allows an easier solution by performing computations such as LU decomposition of the matrix.
In other words, the simultaneous linear equations can be expressed in a format wherein the product of the matrix which shows the coefficient and the column vector which shows the variable is equal to a constant column vector. Here, the solution of the simultaneous linear equations can be found by using a method in which LU decomposition is performed to decompose the matrix showing the coefficient into the upper triangular matrix and the lower triangular matrix, and forward-substitution (forward elimination) and back-substitution is performed. Therefore, performing LU decomposition on the coefficient matrix is an important process in solving simultaneous linear equations. As the related art which efficiently performs the parallel processing of the LU decomposition by using a common memory-type scalar computer, the following patent reference filed by the applicant is disclosed: The Japanese Patent Laid-open Publication 2002-163246 “Parallel matrix processing in common memory-type scalar parallel computer and recording media”.
In the reference, an operation method is disclosed which enables the realization of efficient processing by taking block D of the portion diagonal to the upper left portion corresponding to a plurality of matrixes on the left side of the matrixes on which LU decomposition is performed and the blocks in the matrix direction below D, dividing the blocks in the column direction on the lower side into three portions, L1 to L3, for example, allocating D+L1, D+L2, D+L3 individually to three processors, performing LU decomposition operation in parallel, thereafter, updating the blocks U comprising a plurality of rows on the right side of the block D in the diagonal portion, and furthermore, repeating the update process on the remaining matrixes by using L1 to L3 and U.
Conventionally, a method for performing LU decomposition based on Gaussian elimination is used for the solution of simultaneous linear equations which have a band matrix, out of the sparse matrixes having few factors which are not 0, where factors that are not 0 are present only near the main diagonal line as a coefficient matrix. In the conventional solution such as this, when storing the factors of the matrix in the memory, in order to efficiently store only the portion of the band where the factors that are not 0 are present, a compress storage mode which omits the storage of the factors that are 0 other than those in the band portion is used. In addition, in order to increase the stability of the solution of the LU decomposition, a partial pivoting is adopted. However, in order to reduce the storage area in compress storage mode, a method is used wherein row permutation is performed on only right side of the matrix if the row permutation is performed on each matrix by using the pivot. Furthermore, in update process of the LU decomposition, the operation in the form of the outer product of the vector is used.
The related art of the solution for the simultaneous linear equations with a band coefficient matrix such as this is described in the following references; G. H. Gorub, C. F. Van Loan; Matrix Computations, 3rd Ed. The Johns Hopkins University Press, Baltimore and London (1996).
Generally, a scalar computer has a CPU with high operation performance, but low access performance to memory. For that reason, there was a first problem in that the performance of operation based on the outer product of the vector dependent upon the performance of the memory access was low, and efficiency was lower than that when processing by the vector computer.
Secondarily, there was a problem in that, because the band matrix is stored in the memory in compress mode where only the factors that are not 0 are stored, for example, each column is stored by each factor in the row direction, and update processing can not be performed by using the matrix product in the format as is; and in addition, even if attempt is made to perform the update using the matrix product, the values of the factors outside the storage area may be damaged when the number of the factors in the rows that require the permutations exceeds the storage area in compress mode.
Thirdly, as stated above, there was a problem in that, because the row permutation is performed in a partial pivot form on only the right side of the matrix, updates utilizing a form of the blocked matrix operation cannot be performed.
Fourthly, there was a problem in that if forward elimination is performed after the LU decomposition is completed, because the row permutation is performed in a partial pivot form on only the right side of the matrix, it is necessary to perform permutation on the solution vector as well, to perform update processing by operating the product of the vector and the scalar, and if the operation is processed in parallel, the overhead of the parallel processing increases and the effect of the parallel processing deteriorates.