1. Field of the Invention
The present invention relates to a parallel computer system for use in analysis of physical phenomena containing a large number of high precision sum of products operations by way of floating-point arithmetic operation including non-empirical calculation of molecular orbit used in designing the molecular structure of medicines and in prediction of physical properties.
2. Description of the Related Art
Along with the micronization and the increase of speed of semiconductor devices, high performance computers have come to be realized and molecular simulations using non-empirical calculation of molecular orbits have come to be conducted also in the fields of design of molecular structure and of prediction of values of physical properties in pharmacology.
Among the non-empirical calculations of molecular orbits, Hartree-Fock (HF) method which requires a relatively less amount of calculation and which can fully accommodate with qualitative analysis has been most widely used. The HF method has been described in Shigeru Fujinaga, xe2x80x9cMolecular Orbit Methodxe2x80x9d Iwanami Shoten, 1980, in Eiji Osawa, xe2x80x9cMolecular Orbital Methodxe2x80x9d, Kodansha Scientific, 1994, and in Osamu Kikuchi, xe2x80x9cBasics of Quantum Chemistryxe2x80x9d Asakura Shoten, 1997 for example. The outline of the HF method will be described below.
The HF method is formulated as a method for solving Fock equation by a SCF method described later. Here, the Fock equation may be expressed by the following expression which is obtained as a result of implementing one-electron approximation or linear approximation to Schroedinger equation for the whole molecule:
FC=SC xcex5xe2x80x83xe2x80x83(1)
where, N is a total number of atomic orbits contained in the molecule and m is a total number of molecular orbits expressed by linear approximation of the atomic orbits. Because energy of the molecule may be found by solving this Fock equation, it is possible to judge whether the molecule is stable or not by its value.
In the equation (1), F is a matrix of Nxc3x97N called a Fock matrix, S is a matrix of Nxc3x97N called an overlapping matrix, C is a matrix of Nxc3x97m representing a coefficient and xcex5 is a diagonal matrix of mxc3x97m representing energy of each electron occupying the molecular orbit.
Here, an element Frs (r, s=1 to N) of the Fock matrix may be expressed by the following equation:
Frs=hrs+grs=hrs+xcexa3[t, u=1xcx9cN]Ptu((rs, tu) xe2x88x92(xc2xd)(rt, su))xe2x80x83xe2x80x83(2)
hrs in this equation (2) is an integral amount representing energy to one electron and is calculated by a number proportional to N2 in one time of calculation of the equation (1).
It is noted that in this specification, xcexa3[i, j=1 to N] f(i, j) indicates an arithmetic operation for finding a total sum from 1 to N for i and j with respect to a function f(i, j). xcexa3[i =1 to N]f(i) indicates an arithmetic operation for finding a total sum from 1 to N for i with respect to a function f(i).
Ptu in the equation (2) is called a density matrix and is expressed as follows by using the above-mentioned matrix C:
Ptu=xcexa3(j=1xcx9cm)Cfjxc2x7Cujxe2x80x83xe2x80x83(3)
(rs, tu) (r, s, t, u=1 to N) in the equation (2) is a physical amount called two-electron integral and is represented as follows by using the atomic orbit xi(r)(i=1 to N, r is a coordinate):
(rs, tu)=∫∫xr(r1)xs(r1)(1/r12)xc3x97xt(r2) xu(r2)dr1xc2x7dr2xe2x80x83xe2x80x83(4)
Here, r1 and r2 are coordinate systems independent from each other and double integration is carried out across the respective whole spaces. r12 represents a distance between the coordinate systems r1 and r2. Because r, s, t and u exist by the number of atomic orbits, respectively, the two-electron integral is required by a number proportional to N4 in one time of calculation of the equation (1).
The element Srs of the overlapping matrix S may be represented by the following equation:
Srs=∫xr(r1)xs(r1)dr1xe2x80x83xe2x80x83(5)
Because the HF method is represented as described above, it is a question of finding m characteristic values xcex5i and characteristic vector Ci (i=1 to m) represented by the equation (1). However, as it is apparent from the equations (2) and (3), because the Fock matrix contained in the equation (1) may be found by using the vector Ci representing a coefficient, the value of F cannot be found unless Ci obtained by solving the equation (1) is used.
Accordingly, new Ci is found by setting an adequate value as the initial guess of Ci at first, by finding F by using that Ci and by solving the question of the characteristic value of the equation (1). Then, the equation (1) is solved by calculating new F by using this Ci. This calculation is carried out repeatedly and is ended at last when there is almost no difference between the Ci used in the calculation of F and the found Ci. This method is called as SCF (self-consistent field) method and is widely used in the calculation of molecular orbits.
Because a number of two-electron integrals represented by the equation (1) is proportional to fourth power of the total number N of the atomic orbit, the value of N amounts around 1000 and the number of two-electron integrals amounts to the fourth power of that, i.e., in the order of 100 trillions, when a molecule composed of around 100 atoms which often appears in the field of biology for example. Although a method of judging and cutting off ones whose value is small before calculating the two-electron integrals has been often used here, a number of two-electron integrals which need to be calculated is around 100 millions which is still an enormous amount of number.
Therefore, because there is no memory space for calculating and storing the two-electron integrals once even though the same two-electron integrals are used in each repetition of the SCF method, a direct method of calculating the two-electron integrals again per each repetition is normally used. Because the most of the calculation time is occupied by the calculation of the two-electron integrals in the calculation of molecular orbit of this direct method, it is important to increase the speed of this part.
Here, a Gaussian function which allows the two-electron integrals to be found analytically is normally used for the atomic orbit xi represented by the equation (4). As a method for calculating the two-electron integrals at high speed using the atomic orbit of this Gaussian function, there has been known a method shown in a document 1, xe2x80x9cS. Ohara and A. Sakai, J. Chem. Phys. 84, 3963 (1986)xe2x80x9d(hereinafter referred to as Ohara""s method).
The Ohara""s method is represented by a format of recurrence formula containing auxiliary integration by introducing a value of the auxiliary integration expanding the two-electron integral. One two-electron integral may be expressed by the format of sum of products arithmetic operation containing lower order auxiliary integral by this recurrence formula. The two-electron integral may be found by developing into a format containing only the lowest order auxiliary integral in accordance to the recurrence formula and by finding the higher order auxiliary integral sequentially by the sum of products arithmetic operation. A concrete calculation method of the Ohara""s method will be shown below.
At first, the atomic orbit x represented by the Gaussian function is represented by the following equation in the Ohara""s method:
x(rxe2x88x92R; n, xcex6)=(rxxe2x88x92Rx)nx(ryxe2x88x92Ry)nyxc3x97(rzxe2x88x92Rz)nzexp[xe2x88x92xcex6(rxe2x88x92R)2]xe2x80x83xe2x80x83(6)
Here, r and R are vectors representing spatial positions. R in particular represents the center of atom. n is a vector composed of an integer greater than zero and is called as an orbital quantum number vector. This orbital quantum number vector has three components of nx, ny and nz of x, y and z similarly to r.
xcex6 is a constant called an orbital exponent which changes corresponding to the type of atom and orbit. The sum of three components of the orbital quantum number vector is called an orbital quantum number.
xcex=nx+ny+nzxe2x80x83xe2x80x83(7)
When the orbital quantum numbers xcex are 0, 1, 2 and 3, respectively, the corresponding Gaussian functions are called as s function, p function, d function and f function and are handled as wave functions corresponding respectively to a s orbit, p orbit, d orbit and f orbit of the atom.
The wave function to each orbit is approximated by a linear coupling of those functions. For instance, a wave function corresponding to is orbit of a hydrogen atom is represented by a linear coupling in which orbital exponent xcex6 takes several different kinds of values, wherein n=(0, 0, 0) in the equation (7). The type of xcex6 at this time is different depending on a base function system.
It is noted that when the orbital quantum number xcex is not 0, there exist a plurality of wave functions. In case of a d function in which the orbital quantum number xcex is represented by 2, there exist six kinds of wave functions of dx2, dy2, dz2, dxy, dyz and dzx corresponding to the orbital quantum number vector n=(2, 0, 0), (0, 2, 0), (0, 0, 2), (1, 1, 0) (0, 1, 1) and (1, 0, 1).
Accordingly, when one d function is contained in the two-electron integral expressed by the equation (4), six kinds of two-electron integrals may be obtained corresponding those functions. Here, in the Gaussian function expressed by the equation (6), a set of the Gaussian function in which the orbital quantum number, the center coordinate R of the atom and the parameter xcex6 are the same is called as a shell. Accordingly, they are categorized according to the orbital quantum number xcex as p-shell, d-shell and the like and the d-shell contains six Gaussian functions for example.
It is noted that there is a case when the two-electron integral expressed by the left side of the equation (4) is represented by the type of the respective wave functions such as p and s like (p*s*, p*s*). In this case, xe2x80x98*xe2x80x99 is given to the name of the type to distinct from the name of the wave function.
The first document described above shows a method for calculating the value of the two-electron integrals efficiently by a recursive equation when wave functions a, b, c and d contained in the two-electron integrals expressed by the equation (1) are represented by using the Gaussian function represented by the equation (3). According to the Ohara""s method, an auxiliary physical amount of auxiliary integral (ab, cd)(m) (m is an integer greater than 0) is introduced to the two-electron integrals (ab, cd) to lead the recursive equation of the following format.
(ab, cd)=(ab, cd)(0)xe2x80x83xe2x80x83(8)
(a+lib, cd)(m)=PO(ab, cd)(m)
+P1(ab, cd)(m+1)
+Ni(a)xc3x97P2(axe2x88x92lib, cd)(m)
+Ni(a)xc3x97P3(axe2x88x92lib, cd)(m+1)
+Ni(b)xc3x97P4(abxe2x88x92li, cd)(m)
+Ni(b)xc3x97P5(abxe2x88x92li, cd)(m+1)
+Ni(c)xc3x97P6(ab, cxe2x88x92lid)(m+1)
+Ni(d)xc3x97P7(ab, cdxe2x88x92li)(m+1)xe2x80x83xe2x80x83(9)
(where, i=x, y, z)
(ab, cd)(m)=(s*s*, s*s*)(m)
=P8xc3x97K(A, B, xcex6a, xcex6b)xc3x97K(C, D, xcex6c, xcex6d)xc3x97Fm(T)xe2x80x83xe2x80x83(10)
Here, the wave functions a, b, c and d are all the Gaussian functions expressed by the equation (6) and each has a specific orbital quantum number vector n. The symbol a+1i means a Gaussian function in which the value of i component (i=x, y, z) is increased by one among the orbital quantum number vector of the Gaussian function a. Accordingly, when a is a px function expressed by n=(1, 0, 0) for example, a+1x becomes a dx2 function expressed by n=(2, 0, 0).
Further, the symbol axe2x88x921i means a Gaussian function in which the value of i component (i=x, y, z) is reduced by one among the orbital quantum number vector of the Gaussian function a. The symbol Ni(a) represents the i component of the orbital quantum number vector of the Gaussian function a. Accordingly, when the i component of the orbital quantum number vector of the wave function b is zero, the term related to b xe2x88x921i in the equation (9) is zeroed.
A relational expression similar to the equation (9) may be lead also for (ab+1i, cd)(m), (ab, c+1id)(m) and (ab, cd+1i)(m) when the following relationship which holds for the auxiliary integral is utilized:
(ab, cd)(m)=(ba, cd)(m)=(cd, ab)(m)=(dc, ab)(m)
Still more, the coefficients P0 through P7 in the auxiliary integral of the equation (9) are parameters calculated from the coordinates A, B, C and D of the atomic nucleus, i.e., the center, and xcex6a, xcex6b, xcex6c and xcex6d corresponding to the orbital exponent xcex6 of the equation (6) of the wave functions a, b, c and d and are expressed by the following equations:
P0=Pixe2x88x92Aixe2x80x83xe2x80x83(11)
P1=Wixe2x88x92Pixe2x80x83xe2x80x83(12)
P2=1/(2xcex6)xe2x80x83xe2x80x83(13)
xe2x80x83P3=xe2x88x92xcfx81/(2xcex62)xe2x80x83xe2x80x83(14)
P4=1/(2xcex6)xe2x80x83xe2x80x83(15)
P5=xe2x88x92xcfx81/(2xcex62)xe2x80x83xe2x80x83(16)
P6=1/(2(xcex6+xcex7))xe2x80x83xe2x80x83(17)
P6=1/(2(xcex6+xcex7))xe2x80x83xe2x80x83(18)
where,
xcex6=xcex6a+xcex6bxe2x80x83xe2x80x83(19)
xcex7=xcex6c+xcex6dxe2x80x83xe2x80x83(20)
P=(xcex6aA+xcex6bB)/xcex6xe2x80x83xe2x80x83(21)
Q=(xcex6cC+xcex6dD)/xcex7xe2x80x83xe2x80x83(22)
W=(xcex6P+xcex7Q)/(xcex6+xcex7)xe2x80x83xe2x80x83(23)
xcfx81=(xcex6h/(xcex6+xcex7)xe2x80x83xe2x80x83(24)
The coefficient P8, the parameter T and the function K (R0, R1, xcex60, xcex61) and the function Fm(T) forming the right side of the equation (10) may be expressed by the following relational expressions, respectively:
P8=(xcex6+xcex7)xe2x88x921/2xe2x80x83xe2x80x83(25)
T=xcfx81(Pxe2x88x92Q)2xe2x80x83xe2x80x83(26)
K(R0, R1, xcex60, xcex61)={21/2xc2x7xcfx805/4/(xcex60+xcex61)}x exp[xe2x88x92{xcex60xcex61/(xcex60+xcex61)}(R0xe2x88x92R1)2]xe2x80x83xe2x80x83(27)
Fm(T)=(0 to 1) ∫t2mxc2x7exp[xe2x88x92Tt2]dtxe2x80x83xe2x80x83(28)
Here, the function Fm(T) in the equation (28) is called an error function and the above-mentioned first document describes a method for calculating it by using Tailor expansion. It is noted that (0 to 1) S in the equation (28) denotes finite integration from 0 to 1.
The two-electron integral is represented recursively as shown in the equations (8) through (10) by the Ohara""s method as described above, the target value of the two-electron integral may be obtained repeatedly applying the equation (9) to cause auxiliary integral by which the orbital quantum number is 0 to appear on the right side and by finding the auxiliary integral by which the orbital quantum number is 0 by using the equation (10).
It will be explained by exemplifying a case of finding two-electron integral (p*p*,s*s*), for example. Here, assume that four Gaussian functions contained in the two-electron integral are all included in the specific shell. At this time, because there exists three types of p functions of px, py and pz in accordance to its orbital quantum number vector as described before, there exist the following nine types of two-electron integrals to be found, (px*px*,s*s*), (px*py*,s*s*), (px*pz*,s*s*), (py*px*,s*s*), (py*py*,s*s*), (py*pz*,s*s*), (pz*px*,s*s*), (pz*py*,s*s*) and (pz*pz*,s*s*).
Here, a case of finding (px*py*,s*s*) among them will be explained. When it is expanded by the equations (8) and (9), the following relationships hold:
(px*py*, s*s*)=(px*py*,s*s*)(0)=P0(s*py*,s*s*)(0)+P1(s*py*,s*s*)(1)xe2x80x83xe2x80x83(29)
The auxiliary integral on the right side of the equation (29) takes a form in which only the second wave function is not s function. The similar expression with the equation (9) may be obtained for the auxiliary integral of this form by replacing the wave function as described above. Accordingly, the following two equations may be obtained by applying the auxiliary integral on the right side of the equation (29) again to the equation (9) and by expanding it:
(s*py*,s*s*)(0)=P0xe2x80x2(s*s*,s*s*)(0)+P1xe2x80x2(s*s*,s*s*)(1)xe2x80x83xe2x80x83(30)
(s*py*,s*s*)(1)=P0xe2x80x3(s*s*,s*s*)(1)+P1xe2x80x3(s*s*,s*s*)(2)xe2x80x83xe2x80x83(31)
Thus, all items could be expressed in the form of (s*s*, s*s*)(m). Next, (s*s*, s*s*)(0), (s*s*,s*s*)(1) and (s*s*,s*s*)(2) on the right side of the equations (30) and (31) are found by using the equation (10). Thereafter, (s*py*,s*s*)(0) and (s*py*,s*s*)(1) are found by substituting the value found at first to the right side of the equations (30) and (31) in the order opposite from the expansion. Further, it is applied to the right side of the equation (29) to obtain (px*py*,s*s*)=(px*py*,s*s*)(0) at last.
The remaining eight two-electron integrals may be found by expanding to the numerical expressions similar to the equations (29), (30) and (31). At this time, because (s*s*, s*s*)(0), (s*s*, s*s*)(1) and (s*s*, s*s*)(2) used in the equations (30) and (31) are always used, they may be utilized again without finding those values again. Still more, because the coordinates of the atomic nucleus and the value of xcex6 are the same in those two-electron integrals, the value of coefficient such as P0 and P1 used in the sum or products operation may be used almost in common.
When the four Gaussian functions contained in the two-electron integrals are thus contained in the specific shell, respectively, there exist many auxiliary integrals which can be utilized in common. Accordingly, the two-electron integrals using the Gaussian functions contained in the specific shell are normally calculated altogether. The value of the two-electron integral may be obtained by using the equations (8) through (10) in accordance to the above-mentioned procedure.
As described above, the calculation of two-electron integral is expressed by a repetition of arithmetic operation of sum of products of floating-point by which the value of (coefficientxc3x97auxiliary integral) is added to the value of the other auxiliary integral. At this time, the calculation of the error function in the equation (28) is expressed by Taylor expansion and the inverse number appearing in the equation (13) and others, the inverse number of square root appearing in the equation (25) and the calculation of exp appearing in the equation (27) may be also calculated by the known Newton method and Taylor expansion.
Because the Newton method and the Taylor expansion may be expressed by the repetition of the arithmetic operation of sum of products of floating-point, the calculation of the two-electron integral is a repetition of consecutive arithmetic operation of sum of products after all. Still more, because eight arithmetic operations of sum of products increase at maximum at one time of expansion by the equation (9), a number of arithmetic operation of sum of products required in calculating one two-electron integral also increases.
As described above, when the calculation of molecular orbit is carried out in accordance to the HF method, the two-electron integrals appearing in one time of repetition of the SCF calculation must be calculated by the arithmetic operation of sum of products of floating-point. An amount of this calculation is enormous from the aspects the number of two-electron integrals and of an amount of calculation required for calculating one two-electron integral.
Hitherto, there has been xe2x80x9cThe Architecture of a Molecular Orbital calculation Engine (MOE)xe2x80x9d(Shirakawa et al., Technological Report of IEICE, CPSY96-46 (1996-05) (Second Document) as an example of carrying out the calculation of molecular orbits by high speed. This is what parallelizes the calculation of a plurality of two-electron integrals by utilizing that the calculation of two-electron integrals can be carried out independently more or less.
In the example of the second document, a plurality of processor elements are provided in a system to cause the respective processor elements to share a portion of the calculation of two-electron integrals. While the calculation of two-electron integral is carried out by the arithmetic operation of sum of products of floating-point based on the method of the Ohara""s method, the process or element calculates up to grs among the Fock matrix expressed by the equation (2) and transmits its value to a host processor. Then, the host processor solve the characteristic problem expressed by the equation (1).
There is also one as disclosed in Japanese Patent Laid-Open No. Hei. 9-50428 as another example. In this example, there is provided a computer cluster composed of a plurality of mutually connected computers. Each computer calculates the two-electron integrals and the elements of the Fock matrix expressed by the equation (2) and sends its value to a vector computer. Then, the vector computer solve the characteristic problem expressed by the equation (1).
Because the enormous amount of arithmetic operation of sum of products of floating-points are carried out as described above in the calculation of the two-electron integrals and Fock matrix elements, errors occurring on the way of the arithmetic operation and the accuracy thereof are questioned. Then, the accuracy required for the calculation will be discussed at first.
Although it is difficult to uniformly determine the accuracy required for the calculation because the scale of the calculation of molecular orbits changes depending on the size of molecules and on the base system to be used, it is required to have an accuracy by which a relative error of values of the error function expressed by the equation (28) is smaller than about 10xe2x88x9215 as a criterion for having an enough accuracy in the result of the non-empirical calculation of molecular orbits according to the above-mentioned first document.
That is, a value of energy obtained by the calculation of molecular orbits using the error function becomes a practically enough accuracy in the end in carrying out the calculation of molecular orbits in a scale of specific range by calculating the error function by the accuracy of this range. Then, the accuracy will be estimated based on the criterion shown in the first document.
According to the floating-point representation of double-precision defined by xe2x80x9cIEEE Standard for Binary Floating-Point Arithmeticxe2x80x9d, the length of mantissa part is 52 bits and the value of 1 of the highest order is not contained among them, so that it is considered to have an accuracy of 53 bits. Because this bit length is 2xe2x88x9253=1.11xc3x9710xe2x88x9216 in terms of relative error, the double-precision floating-point satisfies the above-mentioned accuracy.
Then, an accuracy to be satisfied in the arithmetic operation of sum of products will be estimated under the condition that the error function obtained in accordance to the equation (28) has the accuracy wherein the mantissa part has the accuracy of the double-precision arithmetic operation of sum of products of floating-point of 53 bits.
At this time, it is supposed that the inverse number calculation by means of the Newton method is used in the divisions used in the equations (21) through (24) and the calculation by means of the Taylor expansion will be used in the calculation of the error function of the equation (28) as described above. Accordingly, the value of the error function will be calculated by repeating the arithmetic operation of sum of products from the values of A, B, C, D, xcex6a, xcex6b, xcex6c and xcex6d given in advance.
According to xe2x80x9cNomenclature of Mathematicsxe2x80x9d(Iwanami Shoten), errors may be categorized into three errors of (1) input error, (2) truncation error, and (3) rounding error in general. Calculated values of the error function are considered to be influenced by these three types of errors. Then, the influence by these three errors will be explained below as a preparation for estimating the accuracy required for the arithmetic operation of sum of products.
The input error is an error already existing in data given in advance and refers to A, B, C, D, xcex6a, xcex6b, xcex6c and (d (called as initial parameters) expressed by floating-point. Here, assume that the initial parameters are given by the double-precision floating-point. Accordingly, these values have the accuracy of more than the mantissa part of 53 bits and the input error also falls in around this range.
Next, the truncation error is an error originating from truncating a number of times of repetition and a number of items of expansion by a finite number of times in approximating the division by the Newton method or the value of the equation (28) by the Taylor expansion as described above. This error may be controlled by increasing the number of times of repetition of the Newton method, the number of items of expansion of the Taylor expansion and the accuracy of initial values and values of coefficients. Accordingly, it is assumed that the truncation error may be reduced fully and similarly to the input error.
The rounding error is an error occurring in rounding a numerical value obtained at each stage of the calculation to a limited number of digits. Here, the rounding error occurs by repeatedly using the arithmetic operation of sum of products of the finite length of multiplication, addition and subtraction.
At this time, because it is assumed that the input error and the truncation error are fully small and are less than the error of the double-precision floating-point, the influence of those two types of errors given to the value of the error function is small. Then, the accuracy required for the arithmetic operation of sum of products may be calculated under the condition that the value of the calculated error function still maintains the accuracy of the double-precision floating-point having the 53 bit mantissa part even if only the rounding error is considered in this analysis and the rounding error accumulated in the repetition of the arithmetic operation of sum of products is considered.
Next, an elementary error of calculation used in the calculation of the error function will be estimated before concretely analyzing the accuracy of the error function.
At first, the relative error occurring in the addition and multiplication contained in the arithmetic operation of sum of products will be explained. The addition and multiplication causing the rounding error by rounding by a finite length of digit number will be denoted as +r and xc3x97r for convenience and the addition and multiplication causing no rounding error by rounding by an infinite length of digit number will be denoted as +i and xc3x97i for convenience in the following explanation. Further, for simplification, it is assumed that the relative error occurs only by the worst relative error E which occurs due to the computation of finite length uniformly by the addition and multiplication of finite length.
At this time, the finite length addition and multiplication between certain parameters A and B are expressed like the following equations (32) and (33):
A+rB=(1+xcex5)(A+iB)xe2x80x83xe2x80x83(32)
Axc3x97rB=(1+xcex5)(Axc3x97iB)xe2x80x83xe2x80x83(33)
When parameters X and Y are expressed as follows by using parameters A, B, C and D, the finite length addition and multiplication of X and Y are expressed like the following equations (36) and (37):
X=A+rBxe2x80x83xe2x80x83(34)
Y=C+rDxe2x80x83xe2x80x83(35)
X+rY=(A+rB)+r(C+rD)=(1+xcex5)(A+iB)+r(1+xcex5)(C+iD)=(1+xcex5)2((A+iB)+i(C+iD))xe2x80x83xe2x80x83(36)
Xxc3x97rY=(A+rB)xc3x97r(C+rD)=(1+xcex5)(A+iB)xc3x97r(1+xcex5)(C+iD)=(1+xcex5)3((A+iB)xc3x97r(C+iD))xe2x80x83xe2x80x83(37)
Accordingly, when the finite length addition is carried out like the equation (36), the ideal result is raised to second power of the relative error and when the finite length multiplication is carried out like the equation (37), the ideal result is raised to third power of the relative error. Thus, how the relative error contained in the equation remains is different in case of the addition and of the multiplication in the computation among the equations containing the rounding error.
Further, in finding the accumulation of the parameters A through D, while the relative error is (1+)2 in the method of finding X and Y at first and then of adding them like the equation (36), equations of a method of adding the parameters A through D one after another are expanded as shown in the following equation (38):
((A+rB)+rC)+rD=(1+xcex5)3A+i(1+xcex5)3B+i(1+xcex5)2C+i(1+xcex5)Dxe2x80x83xe2x80x83(38)
Accordingly, because the relative error is (1+xcex5)3 to A and B, the relative error is considered to be (1+xcex5)3 to the ideal result with regard to the whole of this equation. Thus, the rounding error changes also with regard to the order of the addition.
Next, the rounding error caused by the Newton method for finding a value of inverse number will be analyzed. In steps of the Newton method, an inverse number of a certain value y may be obtained by normalizing the value y in a specific range to select a value of x close to the inverse number thereof and by normalizing a calculated value of x obtained by repeating the following equations (39) through (41):
y0=yxc3x97xxe2x80x83xe2x80x83(39)
r=2xe2x88x92y0xe2x80x83xe2x80x83(40)
x=rxc3x97xxe2x80x83xe2x80x83(41)
Because the number of times of repetition at this time depends on the difference between the value of x selected at first and the true value, the calculation depends on the accuracy of the initial value. Here, assume that a sufficient truncation error may be obtained when the number of times of repetition is 3. At this time, the relative error is multiplied with (1+xcex5) by one time each by carrying out the multiplication and addition of the equations (39) through (41) in finite length. Accordingly, the relative error occurring by the repetition of three times is (1+xcex5)9.
Because the Newton method is a method for approaching the value of x to (1/y) by the repetition of the equations (39) through (41) described above, the rounding error mixed on the way of the repetition is also actually compensated. Therefore, although the value of the relative error of (1+xcex5)9 described above is over-estimated, this value will be used here by assuming the worst case.
Further, the rounding error in applying the Taylor expansion to the equation (28) will be analyzed. Because an example of expansion of the seventh item is shown in the first document described above, the expansion of the seventh item will be assumed here also. The Taylor expansion at this time is expressed by the following equation (42):
Fm(T)=xcexa3[k=0 to 6]1/k?xcx9cCk(Txe2x88x92Txe2x80x2)xe2x80x83xe2x80x83(42)
Here, Txe2x80x2 is a boundary value closest to T when the range of the given T is divided by a predetermined number of division. It is assumed that the number of division is fully large so that the truncation error of the Taylor expansion becomes small as described above. Ck is the value of a coefficient determined corresponding to the value of Txe2x80x2. This Taylor expansion is actually calculated in the following procedure:
F6=(⅙)xc3x97(Txe2x88x92Txe2x80x2)xc3x97C6xe2x80x83xe2x80x83(43)
F5=(⅕)xc3x97(Txe2x88x92Txe2x80x2)xc3x97(F6+C5)xe2x80x83xe2x80x83(44)
F4=(xc2xc)xc3x97(Txe2x88x92Txe2x80x2)xc3x97(F5+C4)xe2x80x83xe2x80x83(45)
Fm=F0=(Txe2x88x92Txe2x80x2)xc3x97(F1+C0)xe2x80x83xe2x80x83(46)
When the equations (43) through (46) are calculated by using the finite length addition and multiplication, the relative error related to the equation (43) is (1+xcex5)3 and the relative error related to the equations (44) and thereafter is (1+xcex5)4. However, because the first multiplication (one/integer) may be calculated accurately to F4, F2, F1 and F0, the relative error is (1+xcex5)3. As a result, the relative error caused by the Taylor expansion turns out to be (1+xcex5)24.
Because the number of division is taken to be fully large in the actual Taylor expansion as described above, the value of (Txe2x88x92Txe2x80x2) in the equation (42) is fully small. Accordingly, there is a possibility that the coefficient C is fully greater than F with which (1/k) and (Txe2x88x92Txe2x80x2) are multiplied right before and the value of the rounding error mixed in the value of F is no problem in the addition of the value F found right before and the coefficient C like (F6+C5) contained in the equation (44) for example. Therefore, the value of relative error of (1+xcex5)24 is overestimated, this value will be used here by assuming the worst case.
Still more, (Txe2x88x92Txe2x80x2) is multiplied one time each in finding the equations (43) through (46). Therefore, although the relative error contained in T seems to be sixth power when mechanically analyzed, it is also over-estimated by the reason described above because this relative error is also mixed in the value of F in each stage of the repetition. However, the error contained in T is considered to be sixth power by assuming the worst case also here.
The rounding error occurring in the calculation of error function will be described while taking the relative error mixed in the elementary calculation of the error function described above into account. Although the over-estimated part is contained in the value of the rounding error emitted here, the value of estimated error will be used here as it is because the accuracy of the arithmetic operation of sum of products which is sufficient for causing the value of the error function to stay within a predetermined range of error is to be found here.
FIG. 7 shows a path for calculating an error function Fm(T) from the initial parameters A, B, C, D, xcex6a, xcex6b, xcex6c and xcex6d and a relative error occurring in each calculation. The figure shows the path for finding the error function from the initial parameters by sequentially using the arithmetic operation of sum of products by arrows and shows values of the relative error related to each arithmetic operation of sum of products on the right side of the equations.
The relative error contained in the value of T of the equation (26) will be discussed at first. Here, because T is expressed by multiplication of two values of xcfx81 and (Pxe2x88x92Q)2, the rounding error contained in those two values will be described here.
The relative error of (Pxe2x88x92Q)2 may be obtained by analyzing the path of Steps 1 through 5 in FIG. 7. The relative error of Px may be obtained by multiplying the relative errors in the respective Steps as they are and its value is (1+xcex5E)13.
Next, the value of Step 4 is in the form that the item containing Px is multiplied twice, the relative error is considered to be ((1+xcex5)13)2xc3x97(1+xcex5)3=(1+xcex5)29. Further, when the relative error in Step 5 is multiplied, the relative error turns out to be (1+xcex5)31. This value is the relative error of (Pxe2x88x92Q)2.
Next, the relative error of xcfx81 is taken into account by the path of Steps 1, 6, 7 and 8 and the path of Step 1 and 8. Because the relative error is (1+xcex5)11 in Steps 1, 6 and 7 and the relative error is a multiplication of the three in Step 8, the relative error turns out to be (1+xcex5)13 after all. This value is the relative error of r.
The relative error contained in T found in Step 9 is calculated to be (1+xcex5)31+13+1=(1+xcex5)45 from the relative errors of xcfx81 and of (Pxe2x88x92Q)2 found as described above.
The relative error contained in the error function in Step 10 is obtained from the relative error contained in T. When it is calculated based on the influence of the rounding error of the Taylor expansion described above, the relative error of the error function is (1+xcex5)45xc3x976+24=(1+xcex5)294.
Drop of digit caused by subtraction must be also taken into account in the actual analysis of errors. In this case, when a value of P is considerably close to a value of Q in a calculation of (Pxe2x88x92Q) contained in the equation (26), the effective number of digits in the result of subtraction decreases. However, when the drop of digit occurs in (Pxe2x88x92Q) the value of T also decreases in this case. As a result, the value of (Txe2x88x92Txe2x80x2) in the equation (42) is considered to be small and what influences the error function as a result is considered to be items which is not multiplied with (Txe2x88x92Txe2x80x2) containing CO in the equation (42). Therefore, because the influence of the error is considered to be reduced when the drop of digit occurs, the drop of digit will not be taken into account this time.
The relative error caused by the rounding error which is mixed in calculating the error function from the initial parameters turns out to be (1+xcex5)294 as described above. Because this relative error is required to be less than the relative error contained in the double-precision floating-point from the condition to the rounding error described above, the following equation holds to one time of relative error xcex5 by the arithmetic operation of sum of products:
(1+xcex5)294 less than (1+2xe2x88x9253)xe2x80x83xe2x80x83(47)
Here, because the relative error xcex5 is a number smaller than 2xe2x88x9253, xcex5 less than  less than 294. As a result, the left side of the equation (47) may be approximated as 1+294xcex5 and the equation (47) is deformed as xcex5xe2x89xa62xe2x88x9262 in the end. From this conditional expression, a number of digits of the mantissa part of the arithmetic operator of sum of products which is sufficient for the error function to have the accuracy around the double-precision floating-point is 62 bits.
A common arithmetic unit is used in the calculation of the error function and the arithmetic operation of sum of products used in the calculation of the two-electron integrals and the Fock matrix elements normally in the case as described in the above-mentioned second document. Accordingly, the arithmetic operation of sum of products in which the mantissa part is greater than 62 bits is used not only in calculating the error function but also in obtaining the two-electron integrals and the Fock matrix elements.
Because the error function has the accuracy which is shown as the criterion in the first document by calculating it by the arithmetic operation of sum of products greater than 62 bits, it is considered that it is possible to give an enough accuracy also in the result of the calculation of molecular orbits.
However, in the method disclosed in the Japanese Patent Laid-Open No. Hei. 9-50428, the calculation of molecular orbits is carried out by using a vector computer, the existing host, and a general purpose computers which compose a computer cluster. Therefore, there has been a problem that the accuracy of the calculation of the error function or of the arithmetic operation of the characteristic value is limited by the operational accuracy prepared in advance for the host vector computer and the computers composing the cluster.
A multiplier circuit and adder/subtracter circuit corresponding to the representation of the double-precision floating-point defined in the IEEE Standard 754 are provided in the normal computers in particular, the operational accuracy is insufficient for the calculation of the error function even though the operational accuracy of the characteristic value expressed by the equation (1) satisfies the above-mentioned requirement.
Although it is possible to use computers provided with highly accurate arithmetic operation such as quadruple-precision floating-point, its efficiency is bad because it considerably exceeds the specification of the sufficient accuracy of arithmetic operation for the calculation of the error function and it uses the hardware resources wastefully.
Further, it takes time in the conversion of formats of the double-precision floating-point and the quadruple-precision floating-point defined in the IEEE Standard 754. For instance, while a value obtained by adding a specific offset to the value of exponent is stored in the field indicating the exponent in the double-precision floating-point and the quadruple-precision floating-point, a process of implementing an adding/subtracting process to the exponent part is required between them because the value of offset to be added is different among the double-precision floating-point and the quadruple-precision floating-point.
Still more, although it is possible to carry out highly accurate calculation such as the quadruple-precision floating-point by software by using hardware corresponding to the double-precision floating-point, its operating speed drops because it requires extra number of steps for the calculation.
Thus, although ones described in Japanese Patent Laid-Open Nos. Hei. 2-171923, Hei. 6-301710 and Hei. 8-185309 may be cited as examples of calculating a format by using a circuit smaller than the target format, all of them have had a problem that it takes time for the arithmetic operation.
The present invention has been made in view of the above circumstances and provides an apparatus and a method which require a hardware scale which does not exceed the specification and which allow highly accurate parallel calculation while maintaining high computing speed.
In order to solve the above-mentioned problems, according to a first aspect of the invention, a parallel computer system has a host processor and at least one processor element connected to the host processor via a bus. The host processor and the processor element share and calculate a floating point arithmetic operation contained in a specific calculation process. The processor element has a floating-point input/output interface section for inputting/outputting floating-point data having a first form composed of a 1-bit sign part, an exponent part having a bit width of me bits and a mantissa part having a bit width of m0 bits, an input data converting section for converting the floating-point data having the first form inputted from the floating-point input/output interface section into floating-point data having a second form composed of a 1-bit sign part, an exponent part having a bit width of me bits and a mantissa part having a bit width of m1 bits which is greater than m0, a floating-point operating section for executing the floating-point arithmetic operation for the floating-point data having the second form from the input data converting section, and an output data converting section for converting the floating-point data having the second form computed by the floating-point operating section into the floating-point data having the first form and for supplying it to the floating-point input/output interface section.
According to a second aspect of the invention, in the parallel computer system as described in the first aspect, the input data converting section converts the sign part of the floating-point data having the first form into the sign part of the floating-point data having the second form, the exponent part of the floating-point data having the first form into the exponent part of the floating-point data having the second form, and the mantissa part of the floating-point data having the first form of m0 bits into the higher order m0 bits within the mantissa part of the floating-point data having the second form of m1 bits and the lower order m1xe2x88x92m0 bits of the mantissa part of the floating-point data having the second form into a predetermined numerical value. The output data converting section converts the sign part of the floating-point data having the second form into the sign part of the floating-point data having the first form, the exponent part of the floating-point data having the second form into the exponent part of the floating-point data having the first form, and the higher order m0 bit of the mantissa part of the floating-point data having the second form into the mantissa part of the floating-point data having the first form.
According to a third aspect of the invention, in the parallel computer system as described in the first aspect, the input data converting section converts the sign part of the floating-point data having the first form into the sign part of the floating-point data having the second form, the exponent part of the floating-point data having the first form into the exponent part of the floating-point data having the second form, and the highest order 1 bit in the mantissa part of the floating-point data having the second form of m1 bits into 1, m0 bit which is lower than the highest order into the mantissa part of the floating-point data having the first form and the m1xe2x88x92m0xe2x88x921 bit which is lower than that into a predetermined numerical value. The output data converting section converts the sign part of the floating-point data having the second form into the sign part of the floating-point data having the first form, the exponent part of the floating-point data having the second form into the exponent part of the floating-point data having the first form, and m0 bit from the second higher order bit of the mantissa part of the floating-point data having the second form into the mantissa part of the floating-point data having the first form of m0 bit.
According to a fourth aspect of the invention, in the parallel computer system as described in the first aspect, the floating-point data having the second form represents a signed infinite when the exponent part is a first predetermined value emax and represents zero when the exponent part is a second predetermined value emin. A floating-point multiplying unit contained in the floating-point operating section sets the exponent part of output data at the value emax when an overflow occurs, the exponent part of the output data at the value emin when an under-flow occurs, the exponent part of the output data at the value emax when either one exponent part of two pieces of the floating-point data having the second form to be inputted is the value emax, the exponent part of the output data at the value emin when the exponent parts of both of two pieces of the floating-point data having the second form to be inputted are not the value emax and when either one exponent part is the value emin. A floating-point adding/subtracting unit contained in the floating-point operating section sets the exponent part of output data at the value emax when an overflow occurs, the exponent part of the output data at the value emin when an under-flow occurs, and the exponent part of the output data at the value emax when either one exponent part of the two floating-point data having the second form to be inputted is the value emax.
According to a fifth aspect of the invention, in the parallel computer system as described in the first aspect, the floating-point multiplying unit contained in the floating-point operating section implements truncation in finding a mantissa part of the output data, and the floating-point adding/subtracting unit contained in the floating-point operating section implements truncation in finding a mantissa part of the output data.
According to a sixth aspect of the invention, a parallel computing method for sharing and calculating a floating-point arithmetic operation contained in a specific calculation process by a host processor and at least one processor elements connected to the host processor via a bus includes the steps of making the host processor execute an arithmetic operation of floating-point data having a first form composed of a 1-bit sign part, an exponent part having a bit width of me bits and a mantissa part having a bit width of m0 bits, making the processor element execute a floating-point arithmetic operation on floating-point data having a second form whose accuracy is higher than that having the first form and which is composed of a 1-bit sign part, an exponent part having a bit width of me bits and a mantissa part having a bit width of m1 bits which is greater than m0, converting the sign part of the floating-point data input having the first form into the sign part of the floating-point data having the second form via the bus; the exponent part of the floating-point data input having the first form into the exponent part of the floating-point data having the second form; and the mantissa part of the floating-point data input having the first form into higher order m0 bits within the mantissa part of the floating-point data having the second form of m1 bits and the lower order m1xe2x88x92m0 bits of the mantissa part of the floating-point data having the second form into a predetermined numerical value to convert the floating-point data input having the first form into the floating-point data having the second form, executing floating-point arithmetic operation on the floating-point data having the second form to obtain the floating-point data having the second form as a result of the arithmetic operation, and converting the sign part of the floating-point data having the second form as a result of the arithmetic operation into the sign part of the floating-point data having the first form; the exponent part of the floating-point data having the second form as a result of the arithmetic operation into the exponent part of the floating-point data having the first form; and the higher order m0 bit of the mantissa part of the floating-point data having the second form as a result of the arithmetic operation into the mantissa part of the floating-point data having the first form to convert the floating-point data having the second form as a result of the arithmetic operation into the floating-point data output having the first form and to output to the bus.
According to a seventh aspect of the invention, a parallel computing method for sharing and calculating a floating-point arithmetic operation contained in a specific calculation process by a host processor and at least one processor elements connected to the host processor via a bus includes the steps of making the host processor execute an arithmetic operation of floating-point data having a first form composed of a 1-bit sign part; an exponent part having a bit width of me bits; and a mantissa part having a bit width of m0 bits, making the processor element execute a floating-point arithmetic operation on floating-point data having a second form whose accuracy is higher than that having the first form and which is composed of a 1-bit sign part; an exponent part having a bit width of me bits; and a mantissa part having a bit width of m1 bits which is greater than m0, converting the sign part of the floating-point data input having the first form into the sign part of the floating-point data having the second form via the bus; the exponent part of the floating-point data input having the first form into the exponent part of the floating-point data having the second form; the highest order 1 bit in the mantissa part of the floating-point data having the second form of m1 bits into 1; and m0 bit which is lower than the highest order into the mantissa part of the floating-point data having the first form and the m1xe2x88x92m0xe2x88x921 bit which is lower than that into a predetermined numerical value to convert the floating-point data input having the first form into the floating-point data having the second form, executing a floating-point arithmetic operation on the floating-point data having the second form to obtain the floating-point data having the second form as a result of the arithmetic operation, and converting the sign part of the floating-point data having the second form as a result of the arithmetic operation into the sign part of the floating-point data having the first form; the exponent part of the floating-point data having the second form as a result of the arithmetic operation into the exponent part of the floating-point data having the first form; and the higher order m0 bit from the host second bit of the mantissa part of the floating-point data having the second form as a result of the arithmetic operation into the mantissa part of the floating-point data having the first form to convert the floating-point data having the second form as a result of the arithmetic operation into the floating-point data output having the first form and to output to the bus.
According to an eighth aspect of the invention, in the parallel computer system as described in the first aspect, the calculation process is a process based on the molecular orbital method.
According to a ninth aspect of the invention, in the parallel computing method as described in the sixth or the seventh aspect, the calculation process is a process based on the molecular orbital method.
According to the first aspect of the invention, the host processor executes the arithmetic operation on the first form, e.g., the double-precision floating-point data, and the processor element which shares and processes the floating-point arithmetic operation together with the host processor executes the floating-point arithmetic operation by using the floating-point data having the second form whose accuracy is higher than that of the floating-point data having the first form and only whose bit width of the mantissa part is different. Accordingly, it will not cause no overspecification which otherwise occurs when the quadruple-precision is used.
Then, the input data converting section converts the form of the floating-point data from the first to the second in the processor element in receiving input data from the host processor and the output data converting section converts the form of the floating-point data from the second form to the first form in outputting data from the processor element to the host processor.
The conversion of the form of the floating-point data requires conversion of no exponent part like the conversion among the quadruple-precision and the double-precision because only the mantissa part is different. Therefore, the operating speed drops less because the conversion process can be simple.
Then, although an amount of calculation in the processor element is greater than that of the host processor in general in the parallel computer system, the arithmetic operation of the large amount of calculation is implemented at high precision, thus enabling highly accurate calculation by the first aspect as configured as described above. According to the second aspect of the invention, the sign part and the exponent part may be totally identical in the conversion of the floating-point data having the first form and the floating-point data having the second form carried out by the input data converting section and the output data converting section. Accordingly, the scale of hardware required for the conversion may be reduced because the conversion may be simplified.
According to the third aspect of the invention, the mantissa part of the floating-point data having the second form is changed so as to be readily calculated by the floating-point multiplying unit while keeping simple the conversion between the floating-point data having the first form and the floating-point data having the second form carried out in the input data converting section and the output data converting section. Accordingly, the circuit scale of the floating-point multiplying unit may be reduced.
According to the fourth aspect of the invention, zero and infinite are expressed only by the exponent part in the floating-point data having the second form. Accordingly, the circuit scale for handling zero and infinite may be reduced by the floating-point multiplying unit and the floating-point adding/subtracting unit.
According to the fifth aspect of the invention, the rounding process is omitted and the truncation process is carried out in finding the mantissa part in the floating-point multiplying unit and the floating-point adding/subtracting unit.
In case of the fifth aspect of the invention, the bit width of the mantissa part of the floating-point data having the second form is greater than the bit width of the mantissa part of the floating-point data having the first form, so that it is possible to reduce the influence of the rounding process to the bit width of the mantissa part required as the floating-point data having the first form. Therefore, it is possible to maintain the high precision even when the rounding process is omitted and the circuit scale of the floating-point multiplying unit and the floating-point adding/subtracting unit may be reduced because the rounding process may be omitted. Therefore, the arithmetic operation may be carried out at high speed.
According to the sixth and seventh aspects of the invention, the quick conversion and the highly accurate floating-point arithmetic operation may be executed by using the processors generally used by the conversion step between the floating-point data having the first form and the floating-point data having the second form.
According to the eighth and ninth aspects of the invention, the calculation of molecular orbits may be executed at high precision with the same degree of speed of the past.