1. Field of the Invention
The invention relates to a method of operating a vector computer in order to solve an equation including matrix.
2. Description of the Related Art
In the past, the following differential equations (1) to (3) are known in solution of boundary value problems.
In particular, in the case of elliptical partial differential equations, these equations are often used.
A q=rxe2x80x83xe2x80x83(1)
(xcex94xe2x88x92L)p=rxe2x80x83xe2x80x83(2)
xcex94xe2x88x921(xcex94xe2x88x92U)xe2x88x92L)q=pxe2x80x83xe2x80x83(3)
Where, A is an mxc3x97m matrix, xcex94 is an mxc3x97m diagonal matrix, L is a lower mxc3x97m matrix, and U is an upper mxc3x97m matrix, and where p, q and r are respectively vectors of the dimension m.
These boundary value problems arise in a plurality of numerical calculations of technical magnitudes, such as in the calculation of temperature distribution in a three-dimensional body, or calculation of a fluid flow about a given geometric shape. A further simple example is the solution of the Poisson equation with Dirichlet boundary conditions.
Normally, for the numerical solution of the problems, discrete variable method is used, the space in which the solution is to be calculated is covered by a point lattice, and in which then a solution for each lattice point is calculated numerically.
For this purpose, the solution function about each lattice point may be developed into a Taylor series, and then the differential quotients can be replaced by difference quotients. In order to calculate an approximate solution at a lattice point, knowledge of the approximate solution at the adjacent lattice points is sufficient if the remainders of the series development are negligible. Such a method is disclosed for example in xe2x80x9cNumerische Mathematik fxc3xcr Ingenieure und Physikerxe2x80x9d W. Tornig, Volume 2, Springer-Verlag, Berlin.
For example it may be sufficient in a two-dimensional space with a square grid to know the solution at the directly adjacent four lattice points, in order to calculate the approximate solution at the lattice point surrounded by these four lattice points.
FIG. 1 shows a so-called 5-point stencil being used in relation to this case. While in FIG. 2, a 7-point stencil is shown as the corresponding case for three-dimensional space.
By applying the approach described above, with appropriate numbering of the lattice points, to the shown 7-point stencil, the problems may be represented as follows:
aijk,1qijkxe2x88x921+aijk,2qijxe2x88x921k+aijk,3qixe2x88x92ljk+aijk,4qijk+aijk,5qi+1jk+aijk,6qij+1k+aijk,7qiljk+1=rijkxe2x80x83xe2x80x83(4)
For each lattice point (ijk) from the set G with
G={(ijk)1 less than =i less than =nx, 1 less than =j←ny, 1←k←nz}xe2x80x83xe2x80x83(5)
Where, nx, ny, and nz are the number of lattice points in x, y, and z axis directions, respectively.
With appropriate selection of the numbering of the lattice points, the matrix A usually contains only a few extra-diagonal elements which are different from zero.
Equation (1) shown above is then not directly solved, but firstly, using relation of equation (6) and preconditioner (7), is converted into the equation (8).
A=Dxe2x88x92Lxe2x88x92Uxe2x80x83xe2x80x83(6)
M=P1xc3x97P2xe2x80x83xe2x80x83(7)
(P1xe2x88x921A P2xe2x88x921)(P2q)=P1xe2x88x921rxe2x80x83xe2x80x83(8)
Where, D is the diagonal matrix from the elements of the diagonals of the original matrix A.
A frequently used and successful preconditioner is so-called xe2x80x9cincomplete lower upper preconditionerxe2x80x9d (ILU preconditioner).
Also, the following equation holds.
M=(xcex94xe2x88x92Lxe2x80x2)xcex94xe2x88x921(xcex94xe2x88x92Uxe2x80x2)xe2x80x83xe2x80x83(9)
Thus, xcex94=Dxe2x88x92diagonal elements (Lxcex9431 1U)xe2x88x92xcex1 line total (extra-diagonal elements (Lxcex94xe2x88x921U)) with 0 less than =xcex1 less than =1; extra-diagonal elements (F)=Fxe2x88x92diagonal elements (F); and the line total means a diagonal matrix in which each element of the diagonals corresponds to the sum of the elements of the same line.
This is the known Gustavson acceleration (I. Gustafson) xe2x80x9cA Class of First Order Fracturization Methodsxe2x80x9d, BIT 18 (1978), pages 142-156). The matrices Uxe2x80x2 and Lxe2x80x2 are identified in equation (9) with an apostrophe, in order to indicate that they are not necessarily identical to the matrices U and L from equation (6), although this is the case in the ILU approach described. Appropriate alternations to the matrix Uxe2x80x2 and Lxe2x80x2 can lead to improved convergence.
In order to determine xcex94 in equation (9), there are further approaches, roughly the SSOR approach with xcex94=D. Preconditioning corresponds to the solution of the following linear equation (10).
M q=rxe2x80x83xe2x80x83(10)
Where, r is the known vector. A schematic view of this equation is shown in FIG. 3. The solution can be obtained by solving the equations (11) and (12).
(xcex94xe2x88x92L)p=rxe2x80x83xe2x80x83(11)
xcex94xe2x88x921(xcex94xe2x88x92U)q=pxe2x80x83xe2x80x83(12)
If the initial problem was aligned on a structured grid, the elements which are different from zero of the matrices U and L can be arranged regularly. For example, in the case of the 7 point stencil in the regular three-dimensional grid points each of which is included in the above set G and has a lexicographic number of the grid points in the matrix L, elements which are different from zero are present only in the first secondary diagonal, in the nx-th secondary diagonal and in the (nx ny) th secondary diagonal. In other words, only the elements at Li,ixe2x88x921, Li,ixe2x88x92nx and Li,ixe2x88x92nx ny are different from zero.
Here, the n-th secondary diagonal means a sequence which includes elements aligned parallel to a sequence of primary diagonal elements (L1,1, L2,2, . . . ) and is the n-th sequence from the primary diagonal elements.
The equations (11) and (12) can be solved by corresponding forward or reverse substitutions.
p=xcex94xe2x88x921(rxe2x88x92L p)xe2x80x83xe2x80x83(13)
q=pxe2x88x92xcex94xe2x88x921U qxe2x80x83xe2x80x83(14)
An example of pseudo code of an algorithm (1) for these substitutions is shown in FIG. 4.
This method is known as the hyperplane method. It can be shown that hyperplanes can be constructed from lattice points and the lattice points can be calculated independently of one another. This is because that the solution at any optional lattice point of the hyperplane is not dependent on the solution of any other point on the same plane.
Corresponding hyperplanes for three-dimensional space are shown in FIG. 5. The following equation may be applied to the 7-point stencil.
H(l)={i=ix+(iyxe2x88x921)nx+(izxe2x88x921)nxny|ix+iy+iz=l+2}(1 less than =l less than =nl=nx+ny+nzxe2x88x922)xe2x80x83xe2x80x83(15)
The disadvantage of this method resides mainly in the fact that the efficiency of a queue processing or a vector processing in a vector processor is greatly reduced by the indirect addressing in the above described algorithm (1).
It would be more advantageous to be able to access the memory always with a constant stride (see also xe2x80x9cComputer Architecture.xe2x80x9d, Computational Science Education Project, Section 3.3.1 xe2x80x9cInterleaved Memoryxe2x80x9d).
Moreover the method is not suitable for computers with divided parallel memory blocks and a plurality of vector processors, as no parallelism occurs relative to magnitude of the hyperplane, and as many hyperplanes H(l) alone are too small to be logically distributed between a plurality of processors. Furthermore, in the known hyperplane method, no generalization to higher-dimensional spaces or stencils other than the known 5-point or 7-point stencils is known. An approach in order to counter these problems is not to solve the equations (11) and (12) exactly, but to calculate an approximate solution by iteration. Normally, a von-Neumann series development is used for this.
p=(xcex94xe2x88x92L)xe2x88x921r=xcex94xe2x88x921(Ixe2x88x92Lxcex94xe2x88x921)xe2x88x921r=xcex94xe2x88x921r+xcex94xe2x88x921Lxcex94xe2x88x921r+(xcex94xe2x88x921L)2xcex94xe2x88x921r+(xcex94xe2x88x921L)3xcex94xe2x88x921r+ . . . +(xcex94xe2x88x921L)ntrxcex94xe2x88x921rxe2x80x83xe2x80x83(16)
This method is in fact fully capable of vector processing, and can be processed in parallel, as matrix-vector-products are always only calculated. However, in order to obtain an approximate solution, the result of the method is not precise as good as those of the hyperplane method. Also, the von-Neumann series must be calculated up to the members of the third or fourth order. Therefore, the computational outlay is three or four times as great, as a result, the advantage of vectorization and parallelization is removed.
Another approach in order to convert the hyperplane method in a suitable way for vector computers, was disclosed by Takumi Washio and Ken Hayami in xe2x80x9cOverlapped Multicolor MILU Preconditioningxe2x80x9d, J. SCHI Computer, Volume 16, No. 5 on pages 636-651 in May 1995.
In accordance with this approach, for the case of three dimensional space and the 7-point stencil, firstly the hyperplanes are so divided that a number istr of subsets are generated. This is represented as following equation (17).
C(l)={(i,j,r)xcex5G| mod(i+j+kxe2x88x923, istr) +1=l}xe2x80x83xe2x80x83(17)
In other words, the subset C(l) is a unification of the hyperplanes H(l), H(l+istr), H(l+2istr) . . .
Then, each of the subsets, which are also called xe2x80x9ccolorsxe2x80x9d, is calculated in the sequence C(1), C(2), . . . C(istr), C(1), C(2), . . . C(istr) for the forward substitution, and in the sequence C(istr), C(istrxe2x88x921), . . . C(1), C(istr), C(istrxe2x88x921) . . . C(1) for the reverse substitution. This is shown by way of example shown in FIG. 6. Unknown start values are respectively set to zero. In this method, each color subset is calculated at least twice, so that the dependence of H(max(l-istr, 1) on H(l) is taken into account.
An example of pseudo code executing an algorithm (2), which performs this method, is shown in FIG. 7.
Also, a flowchart corresponding to this algorithm is shown in FIG. 8.
By appropriate selection of istr, a situation can be achieved in which each color subset contains so many lattice points that a vector processing is logically possible. For the known 7-point stencil, additional lattice points can be so arranged that for each element from C(l), the memory can be accessed with a constant stride, and thus indirect addressing can be avoided. For example, when nx=ny and with optional nz, istr=nxxe2x88x921 can be selected. In this case, all memory accesses for calculating the lattice points within C(l) can be executed with a constant stride istr, and on the other hand, (nxnynz)/istr is sufficiently large as a vector length.
For example, when a 7-point stencil with a lattice points which in the x-axis direction has nx=10, in the y-axis direction ny=11, and in the z-axis direction nz=10, and with istr=9 determined as above, after a division into color subsets is undertaken, the calculation is done as shown in FIG. 6. It is thus clear that the six further lattice points required for calculating a lattice point i, that is i+1, i+10, i+110, ixe2x88x921, ixe2x88x9210, ixe2x88x92110, which do not belong to the same color subset. Therefore, calculation of a lattice point of a color subset is independent of the calculations of the other lattice points of the same color subset. Indirect addressing is likewise not necessary, since when further lattice points are required, they are immediately visible from knowledge of i. Also, high speed processing on a vector processor is possible, as the corresponding memory areas in which the data of the lattice points i+1, i+10, i+110, ixe2x88x921, ixe2x88x9210 and ixe2x88x92110 are stored can be preloaded.
A further improvement to this multi-color technique, which is likewise disclosed in the above-named article, is the so-called xe2x80x9cOverlapping Multicolorxe2x80x9d technique. Here the convergence of the multi-color technique is improved in that for the forward substitution of the color subsets are executed in the sequence C(istr-w), (C(istr-wxe2x88x921) . . . C(istr), C(1), C(2) . . . C(istr). The overlapping, i.e. the number of color subsets which are additionally calculated, comes to w. A corresponding system may also apply to reverse substitution. This overlapping multi-color technique shows an improved convergence compared to the single multi-color technique, but offers the same advantages as regards vector length and memory access.
A difficulty in the multi-color technique and in the multi-color technique with overlapping is determining the number of color subsets and applying to cases other than the known 7-point stencil or 5-point stencil. Too small a number of color subsets leads to an increased computing outlay, since the start value for the first approximation is too remote from the solution. Too large a number of color subsets leads to a lower computing speed, since the vector lengths for the queue structure of the vector computer become too short.
Determination of the magnitude of the istr in stencils other than the above 5-point and 7-point stencils is not so simple, since a condition remains that the individual lattice points of a color subset must be independent of one another.
The object of the invention is to provide an improved method of solving the above equation (1) on a vector computer with a given queue length.
The method according to the invention is suitable particularly in solving partial elliptic differential equations for a regular square lattice.
An advantage of the invention is that the method is specially adapted to the actualities of a vector computer, such as the queue length and the number of memory banks, so that it operates more rapidly than the known methods.
A further advantage of the method according to the invention is that, without further outlay, other dependencies than the known 5-point stencil or 7-point stencil can be taken into account.
A yet further advantage of the method according to the invention is that it can be well processed in parallel, i.e. can be processed in parallel on a plurality of processors, in particular, in a computer system with distributed memory.
The object, advantages and special aspects of the invention are explained with reference to the description of a special embodiment.