1. Field of the Invention
The present invention relates to an arithmetic process using a shared memory type scalar parallel computer.
2. Description of the Related Art
Conventionally, when an inverse matrix of a matrix is obtained using a vector computer, a computing method based on the Gauss-Jordan method and that operations are quickly performed by utilizing fast-access memory. For example, the methods such as a double unrolling method, etc. in which an instruction in a loop is unrolled into a single list of instructions and executed once.
Described below is the method of obtaining an inverse matrix in the Gauss-Jordan method (or referred to simply as the Gauss method). (In the explanation below, the interchange of pivots is omitted, but a process of interchanging row vectors for the interchange of pivots is actually performed).
Assume that A indicates a matrix for which an inverse matrix should be calculated, and x and y indicate arbitrary column vectors. Ax=y is expressed as simultaneous linear equations as follows when matrix elements are explicitly written.a11x1+a12x2+ . . . +a1nxn=y1 a21x1+a22x2+ . . . +a2nxn=y2 . . .an1x1+an2x2+ . . . +annxn=y2 
If the above listed equations are transformed into By=x, then B is an inverse matrix of A, thereby obtaining an inverse matrix.    1) The equation in the first row is divided by a11.    2) Compute the i-th row (i>1)−the first row×ai1.    3) To obtain the coefficient of 1 for x2 of the equation in the second row, multiply the second row by a reciprocal of the coefficient of x2.    4) Compute the i-th row (i>2)−the second row×ai2.    5) The above mentioned operation is continued up to the (n−1)th row.
The interchange of column vectors accompanying the interchange of pivots is described below.
Both sides of Ax=y are multiplied by the matrix P corresponding to the interchange of pivots.PAx=Py=z
When the following equation holds with the matrix B,x=Bz
then B is expressed as follows.B=(PA)−1=A−1P−1 
That is, by right-multiplying the obtained B by P, an inverse matrix of A can be obtained. Actually, it is necessary to interchange the column vectors.
In the equation, P=PnPn−1 . . . P1, and Pn is an orthogonal transform having matrix elements of Pii=0, Pij=1, Pjj=0, and Pji=1.
In the vector computer, an inverse matrix is computed in the above mentioned method based on the assumption that the operation of the memory access system can be quickly performed. However, in the case of the shared memory type scalar computer, the frequency of accessing shared memory increases with an increasing size of a matrix to be computed, thereby largely suppressing the performance of a computer. Therefore, it is necessary to perform the above mentioned matrix computation by utilizing the fast-accessing cache memory provided for each processor of the shared memory type scalar computer. That is, since the shared memory is frequently accessed if computation is performed on each row or column of a matrix, it is necessary to use an algorithm of localizing the computation assigned to processors by dividing the matrix into blocks, each processor processing the largest possible amount of data stored in the cache memory, and then accessing the shared memory, thereby reducing the frequency of accessing the shared memory.