1. Field of the Invention
The present invention relates to a method and apparatus for parallel processing preferably used for calculation of massive matrix elements having certain symmetry and, more particularly, for processing calculation of Fock matrix elements at a high speed during simulation of a molecule using the ab initio molecular orbital method.
2. Description of the Related Art
Techniques for numerical analysis of the state and behavior of molecules in the field of chemistry include the molecular orbital method, molecular dynamics method and Monte Carlo method. Among such methods, ab initio molecular orbital calculation are quantum-mechanical calculations based on the first principle which are aimed at describing the behavior of electrons in a molecule. Therefore, this method is regarded as a basis for simulation of a molecule and is a technique of importance from the industrial point of view which is used for close analysis of structures of substances and chemical reactions.
According to the ab initio molecular orbital calculation, a basis function is the inverse of an exponential function whose exponent is obtained by multiplying an empirical coefficient by the square of the distance between the atomic nucleus of an atom forming a part of a molecule and an orbital electron or linear combination of them, and a plurality of such basis functions are prepared for one atom. A wave function of an orbital electron in a molecule, i.e., a molecular orbital is described by linearly combining those basis functions.
A primary process of the ab initio molecular orbital calculation is to determine linear combination coefficients for the basis functions of molecular orbitals, and the calculation necessitates computational complexity and storage capacity proportion to the biquadratic of the number of the basis functions. For this reason, ab initio molecular orbital calculations are only used for molecular systems on the scale of 100 atoms or so at present. In order to realize more practical analysis of biological and chemical phenomena from the viewpoint of molecular theory, it is essential to develop a calculation system dedicated to ab initio molecular orbital calculations which is applicable even to molecular systems on the order of several thousand of atoms.
[Summary of Ab initio Molecular Orbital Calculation]
In the ab initio molecular orbital calculation, a state of a molecule "psgr" is described using an electron orbital xcfx86xcexc which corresponds to a spatial orbital of an electron in the molecule. Here, xcexc is a suffix which indicates a xcexc-th orbital among plural molecular orbitals. A molecular orbital xcfx86xcexc is a linear combination of atomic orbitals "khgr"I which is approximated by Expression 1 in FIG. 23.
In Expression 1, I is a suffix which indicates an I-th atomic orbital among plural atomic orbitals. An atomic orbital is also referred to as xe2x80x9cbasis functionxe2x80x9d. In this specification, an atomic orbital is hereinafter referred to as xe2x80x9cbasis functionxe2x80x9d. CIxcexc in Expression 1 is a linear combination coefficient. The sum regarding I in Expression 1 is related to all basis functions that form a molecule to be calculated.
In order to describe molecular orbitals on a quantum mechanical basis, the states of electrons in the molecule must satisfy the well known Pauli exclusion principle. A Slater determinant like Expression 2 in FIG. 23 is used as an expression to describe a state "psgr" of the molecule with 2n electrons such that it satisfy the Pauli exclusion principle taking electron spins into consideration. xcex1(x) and xcex2(x) in Expression 2 respectively represent states in which an x-th electron spin is upward and downward.
The Hamiltonian H for the molecule with 2n electrons is in the form of the sum of a one-electron part H1 and two-electron part H2 and is expressed by Expressions 3 through 5.
The first term in the parenthesis on the right side of Expression 4 in FIG. 23 represents a kinetic energy of electrons p, and the second term represents an interaction between a p-th electron and an A-th atomic nucleus. In Expression 4, xcexa3p (xcexa3i represents sum regarding i throughout the present specification) represents the sum of all electrons; xcexa3A represents the sum of all atomic nuclei; ZA represents the charge of an atomic nucleus A; and rpA represents the distance between an electron p and the atomic nucleus A.
Expression 5 represents an interaction between electrons p and q. xcexa3pxcexa3q( greater than p) represents a sum of combinations of two electrons; and rpq represents the distance between electrons p and q.
By using the above-described Hamiltonian H and the Slater determinant in Expression 2, an expected value xcex5 of a molecular energy is expressed by Expressions 6 through 9 in FIG. 24.
In Expression 6, xcexa3xcexc and xcexa3xcexd represent a sum regarding n (n is a positive integer) molecular orbitals. Expression 7 is referred to as xe2x80x9ccore integralxe2x80x9d and describes a typical electron which is numbered xe2x80x9c1xe2x80x9d. Expressions 8 and 9 are respectively referred to as xe2x80x9cCoulomb integralxe2x80x9d and xe2x80x9cexchange integralxe2x80x9d and describe typical electrons 1 and 2.
Expression 6 can be changed using basis functions into Expressions 10 through 13 shown in FIG. 25. The integral expressed by Expression 13 is referred to as xe2x80x9celectron-electron repulsion integralxe2x80x9d or simply xe2x80x9celectron repulsion integralxe2x80x9d.
An expected value xcex5 of a molecular energy expressed by Expression 10 includes unknown coefficients CIxcexc, and no numerical value is obtained as it is. CIxcexc correspond to the linear combination coefficients in Expression 1, where xcexc represents integers 1 through n (the number of molecular orbitals); I represents integers 1 through N (N is the number of basis functions which is a positive integer). Hereinafter, an Nxc3x97n matrix C whose elements are CIxcexc is referred to as xe2x80x9ccoefficient matrixxe2x80x9d.
One method used to determine a coefficient matrix that minimizes the expected value xcex5 to obtain a wave function "psgr" in a ground state is the Hartree-Fock-Roothaan variational method (hereinafter referred to as xe2x80x9cHFR methodxe2x80x9d. Expressions 14 through 18 in FIG. 26 are Expressions obtained by the HFR method, although the process of derivation is omitted.
FIJ and PKL are respectively referred to as xe2x80x9cFock matrix elementsxe2x80x9d and xe2x80x9cdensity matrix elementsxe2x80x9d. In the following description, those elements are sometimes expressed by F(I, J) and P(K, L). They have numerical values corresponding to I and J or K and L which are integers 1 through N, and are each expressed in the form of an Nxc3x97N matrix.
A coefficient matrix is obtained by solving Expression 14. Expression 14 is a system of nxc3x97N expressions because it applies to all xcexc""s from 1 through n and all I""s from 1 through N.
A density matrix P is used to calculate a coefficient matrix C obtained by solving Expression 14. A density matrix P is calculated from a coefficient matrix C as indicated by Expression 18. Referring to the calculation procedure specifically, an appropriate coefficient matrix C is initially provided; a Fock matrix F is calculated from Expression 15 using a density matrix P calculated using the same; and a new coefficient matrix C is obtained by solving the simultaneous equations of Expression 14. The above-described calculations are iterated until the difference between the primitive coefficient matrix C from which the density matrix P is obtained and the resultant coefficient matrix C becomes sufficiently small, i.e., until it becomes self-consistent. Such iterative calculations are referred to as xe2x80x9cself-consistent field calculationxe2x80x9d (hereinafter abbreviated to read as xe2x80x9cSCF calculationxe2x80x9d).
The most time-consuming calculation in practice is the calculation of Fock matrix elements FIJ of Expression 15. The reason is that Expression 15 must be calculated for each combination of I and J and that the sum regarding K and L of the density matrix elements PKL must be calculated for each combination of I and J.
There are two methods for SCF calculations. One is a method referred to as xe2x80x9cdisk storage SCF methodxe2x80x9d in which the values of all electron repulsion integrals obtained by a first SCF calculation are stored in a disk and in which required electron repulsion integrals are fetched from the disk to be used in a second and subsequent calculations. The other is a method referred to as xe2x80x9cdirect SCF methodxe2x80x9d in which calculations of electron repulsion integrals are redone each time an SCF calculation is carried out.
In present, the direct SCF method has become the main stream for reasons including limitations on disk capacities and the length of access time. When molecular orbitals are calculated using the direct SCF method, electron repulsion integrals in a quantity substantially proportionate to N4 must be calculated for each SCF calculation. Therefore, an increase in the speed of the calculation of electron repulsion integrals directly results in an increase in the speed of molecular orbital calculations.
Symmetry of electron repulsion integrals G(I, J, K, L), a density matrix P(K, L) and a Fock matrix F(I, J) will now be briefly described.
As apparent from Expression 13, electron repulsion integrals have symmetry as shown in Expression 19 in FIG. 26. Therefore, if a numerical value is obtained for one integral in Expression 19, numerical values are obtained for the other seven integrals.
Expression 18 in FIG. 26 indicates that P(K, L)=P(L, K), and Expression 15 in FIG. 26 and Expression 11 in FIG. 25 indicate that F(I, J)=F(J, I).
[Contracted Basis Function and Primitive Basis Function]
In general, basis functions as shown in Expression 20 in FIG. 27 are used in ab initio molecular orbital methods. In Expression 20, r, n and R represent vectors whose components are indicated by suffixes x, y and z. r represents the coordinate of an electron; n represents an angular momentum of an electron; and R represents the coordinate of an atomic nucleus.
nx+ny+nz=xcex represents the magnitude of an angular momentum which is also referred to as xe2x80x9corbital quantum numberxe2x80x9d. An orbital is referred to as xe2x80x9cs-orbitalxe2x80x9d if its orbital quantum number xcex is 0, as xe2x80x9cp-orbitalxe2x80x9d if the number xcex is 1 and as xe2x80x9cd-orbitalxe2x80x9d if the number xcex is 2.
xcex6m represents an orbital exponent which indicates spatial expansion of orbitals. One basis function may be represented by a linear combination of plural orbitals having different orbital exponents. A basis function represented in such a manner is referred to as xe2x80x9ccontracted basis function, and a linear combination coefficient dm is referred to as xe2x80x9ccontraction coefficientxe2x80x9d. An function xcfx86 in the form of Expression 21 in FIG. 27 which has not been subjected to linear combination yet is referred to as xe2x80x9cprimitive basis functionxe2x80x9d.
It is a common practice to number contracted basis functions "khgr" with capitals such as I, J, K, and L, and to number primitive basis functions xcfx86 with small letters i, j, k and l, and the present specification follows the same.
[Contracted Shells and Primitive Shells]
When the orbital quantum number is 1, three contracted basis functions, i.e., n=(1, 0, 0), n=(0, 1, 0) and n=(0, 0, 1) exist. Similarly, when the orbital quantum number is 2, six or five contracted basis functions exist depending on how to configure the basis functions.
A set of plural contracted basis functions which share a common part in Expression 20 as shown in Expression 22 of FIG. 27 is referred to as xe2x80x9ccontracted shellxe2x80x9d. A contracted shell in a p-orbital is constituted by three contracted basis functions, and a contracted shell in a d-orbital is constituted by six (or five) contracted basis functions. For simplicity, one set of contracted basis functions in an s-orbit is also referred to as xe2x80x9ccontracted shellxe2x80x9d.
A set of primitive basis functions which share the part of xe2x80x9cexp[-xcex6(rxe2x88x92R)2]xe2x80x9d in Expression 21 is similarly referred to as xe2x80x9cprimitive shellxe2x80x9d. It is a common practice to number contracted shells with capitals such as R, S, T and U and to number primitive shells with small letters such as r, s, t and u, and the present specification follows the same.
When molecular orbital calculations are carried out, plural contracted shells having different orbital quantum numbers are prepared for each atom that forms a part of the molecule to be calculated, and a collection of all of such shells is used as a set of basis functions. One contracted shell can be represented by a combination (R, xcex) of a nuclear coordinate R and an orbital quantum number xcex.
[Expression of Electron Repulsion Integral]
An electron repulsion integral G(I, J, K, L) expressed by contracted basis functions is expressed as shown in Expression 23 in FIG. 27 when the primitive basis functions are used. g(i, j, k, l) can be expressed as in Expression 24 in FIG. 27.
While G(I, J, K, L) is referred to as xe2x80x9celectron repulsion integral represented by contracted basis functionsxe2x80x9d and g(i, j, k, l) is referred to as xe2x80x9celectron repulsion integral represented by primitive basis functionsxe2x80x9d, both of them may be simply referred to as xe2x80x9celectron repulsion integralxe2x80x9d in the following description. g(i, j, k, l) is also symmetrical as indicated by Expression 25 in FIG. 27.
A premitive basis function xcfx86 can be uniquely represented by a combination of an angular momentum n, an orbital exponent xcex6 and a nuclear coordinate R. It is assumed here that i-th, j-th, k-th and l-th primitive basis functions have angular momenta, orbital exponents and nuclear coordinates shown in Table 1 of FIG. 19.
For simplicity, an electron repulsion integral is expressed by, for example, [ab, cd] in which angular momenta a, b, c and d stand for the numbers i, j, k and l for primitive basis functions.
An efficient method for calculating an electron repulsion integral using a set of basis functions prepared as described above will now be described with reference to an article 1 by S. Obara and A. Saika, xe2x80x9cEfficient recursive computation of molecular integrals over Cartesian Gaussian functionsxe2x80x9d, JCP, Vol. 84, No. 7, p. 3964, 1986.
First, let us assume that all of the angular momenta a, b, c and d are the s-orbital, i.e., a=0a=(0, 0, 0); b=b=(0, 0, 0); c=0c=(0, 0, 0); and d=0d=(0, 0, 0). Then, the electron repulsion integral of Expression 24 is calculated as expressed by Expressions 26 through 34 in FIG. 28.
In Expression 26, [.., ..](m) is an auxiliary integral, and m is an auxiliary index, which will be described later. The range of integration of Expression 27 is from 0 to 1.
If any of a, b, c and d is an orbital other than the s-orbital, the calculation is carried out using the recurrence formulae shown in Expressions 35 and 36 in FIG. 29.
The suffix i in Expression 35 indicates an x-, y- or z-component. 1i represents a vector in which only the i-component is 1 and the other components are 0. Ni(n) represents the value of an i-component of an angular momentum n. Expression 35 is characterized in that one of angular momenta seen in the auxiliary integral on the left-hand side always decreases on the right-hand side by 1 or more and in that the auxiliary index of the auxiliary integral on the left-hand side remains the same or increases by 1 on the right-hand side.
The auxiliary integral [.., ..](m) exactly coincides with an electron repulsion integral [.., ..] when the auxiliary index m is 0 and supports calculation of an electron repulsion integral [.., ..]. An auxiliary integral whose angular momenta are all (0, 0, 0) can be finally obtained by applying Expression 35 to reduce the angular momenta of an electron repulsion integral repeatedly, however angular momenta of the basis functions included in the electron repulsion integral are large. Since the auxiliary integral whose angular momenta are all (0, 0, 0) can be calculated using Expression 26, the value of the electron repulsion integral can be obtained by multiplying those values by appropriate coefficients and by adding the products.
In practice, the calculation is carried out as follows.
First, an electron repulsion integral is expressed in a form utilizing eight or less auxiliary integrals according to Expression 35. Expression 35 is further applied to the resultant auxiliary integrals. The sequence to reach auxiliary integrals whose angular momenta are all (0, 0, 0) by repeating such a procedure is recorded as a calculation procedure.
Next, Expression 26 is used to calculate auxiliary integrals whose angular momenta are all (0, 0, 0). Then, the above-described calculation procedure is carried out to calculate the numerical values of the auxiliary integrals, and a desired electron repulsion integral is finally obtained.
Another important characteristic of Expression 35 is that the same calculation procedure as described above can be used to any electron repulsion integral as long as it has the same combination of four angular momenta even if the orbital exponents and nuclear coordinates are combined differently. When the calculation is carried out, it is only required to change the coefficients by which the auxiliary integrals are multiplied depending on the orbital exponents and nuclear coordinates.
[Cut-off]
As described above, the number of electron repulsion integrals represented by contracted basis functions to be calculated is N4 where N represents the number of the contracted basis functions. In practice, numerical values must be obtained for electron repulsion integrals which are presented by primitive basis functions, the total number of which ranges from several times to several tens times the number of electron repulsion integrals represented by contracted basis functions (depending on the number of the primitive basis functions which constitute the contracted basis functions, i.e., the number of contractions).
A first possible method for reducing the number is to take advantage of symmetry as seen in Expression 19 or 25. However, this method can reduce the amount of calculations of electron repulsion integrals only by a factor of eight even when the highest efficiency is achieved.
Another method is to actively eliminate calculations for electron repulsion integrals which can be regarded unnecessary from the viewpoint of computational accuracy. Such unnecessary electron repulsion integrals can be determined as follows.
As described above, the numerical values of all electron repulsion integrals are calculated based on the numerical values of auxiliary integrals [00, 00](m) whose angular momenta are all (0, 0, 0) as shown in Expression 26. It is therefore possible to determine from the numerical values of [00, 00](m) whether the contribution of the numerical value of an electron repulsion integral to the numerical values of Fock matrix elements is at a degree within a computational tolerance. Further, the magnitude of the numerical values of [00, 00](m) can be determined from the value of the function K(xcex6, xcex6xe2x80x2, R, Rxe2x80x2) shown in Expression 29 which can be in turn determined from the magnitude of Expression 37 in FIG. 29.
It is therefore possible to determine whether it is necessary to calculate electron repulsion integrals [ab, **] by estimating the magnitude of the first function K in Expression 25 from a combination of numerical values (xcex6a, A, xcex6b, B). It is also possible to determine whether it is necessary to calculate electron repulsion integrals [**, cd] by estimating the magnitude of the second function K in Expression 26 from a combination of numerical values (xcex6c, C, xcex6d, D).
Such elimination of calculations for unnecessary electron repulsion integrals is referred to xe2x80x9ccut-offxe2x80x9d. In the above example, a cut-off performed based on determination of only information of a and b may be referred to as xe2x80x9cab-based cut-offxe2x80x9d, and a cut-off performed based on determination of only information of c and d may be referred to as xe2x80x9ccd-based cut-offxe2x80x9d. Whether to perform ab- or cd-based cut-off can be determined because Expression 37 in FIG. 29 has a maximum value of 1 and a lower limit value of 0. Cut-off in such a manner reduces the number of electron repulsion integrals to be calculated to a value substantially proportionate to N2, which makes it possible to reduce the amount of calculations significantly.
The above description indicates that the effect of reducing the amount of calculations based on cut-off is incomparably greater than the effect achievable using symmetry of electron repulsion integrals when the number N is great. By taking advantage of this fact, the processing time of Fock matrix calculations in the ab initio calculation of molecular orbitals can be significantly reduced.
[Example of Molecular Orbital Calculation System]
An example of a system for calculating Fock matrix elements at a high speed using parallel computers is disclosed in an article 2 (Shirakawa et al., xe2x80x9cThe Architecture of a Molecular Orbital calculation Engine (MOE)xe2x80x9d, IEICE Technical Report, Vol. CPSY96-46, No. 5, pp. 45-50, 1996).
The article 2 discloses a system in which plural processor elements are connected to a host computer through a bus to perform parallel processing. In the course of examination of an architecture of a parallel processing system having such a configuration, a total amount of calculation and a memory capacity required for processor elements were estimated for various methods employed for routing a quadruple loop formed by four indices R, S, T and U and employed for a part which implements parallel processing.
Each of the processor elements of the parallel processing system disclosed in the article 2 has a high arithmetic capability, and the entire system can be provided at a low cost. It is therefore possible to provide a computer system having high cost performance. However, the article 2 has made no mention to methods that take the above-described cut-off into account and a specific methods for controlling the loop, and a question therefore remains as to whether efficient processing is possible or not.
[Method of I. Foster et al.]
Algorithm to efficiently calculate Fock matrix elements using a parallel computer is disclosed in an article 3 (I. T. Foster, et al., xe2x80x9cToward High-Performance Computational Chemistry: I. Scalable Fock Matrix Construction Algorithmsxe2x80x9d, Journal of Computational Chemistry, Vol. 17, No. 1, pp. 109-123, 1996).
In the article 3, analysis has been made on the amount of calculation for several kinds of algorithm for calculating Fock matrix elements and on the amount of communication between a host computer and plural processor elements. The details of the analysis will be described below.
First algorithm is the simplest algorithm referred to as xe2x80x9ccanonical ordering methodxe2x80x9d. In the method, four contracted basis functions I, J, K and L and six density matrix elements PIJ, PIK, PIL, PJK, PJL and PKL which satisfy Expression 38 shown in FIG. 30 are passed to one processor element which is caused to calculate electron repulsion integrals and a part of Fock matrix elements FIJ, FIK, FIL, FJK, FJL and FKL according to Expression 39 in FIG. 30.
Let us assume that the number of matrix elements communicated between the host computer and processor element during the calculation of one electron repulsion integral is counted using a unit xe2x80x9cperERIxe2x80x9d. Then, the number of communicated items of data is 12[perERI].
Second algorithm is algorithm referred to as xe2x80x9ctriple sort methodxe2x80x9d. Four contracted basis functions I, J, K and L and six density matrix elements PIJ, PIK, PIL, PJK, PJL and PKL which satisfy Expression 40 shown in FIG. 31 are passed to one processor element which is caused to calculate three electron repulsion integrals G(I, J, K, L), G(I, K, J, L) and G(I, L, J, K) and a part of Fock matrix elements FIJ, FIK, FIL, FJK, FJL, and FKL according to Expression 41 in FIG. 31.
In this case, since six density matrix elements and six Fock matrix elements must be transferred during the calculation of the three electron repulsion integrals, the number of communicated item of data is 4[perERI]. This method is regarded better than the canonical ordering method in terms of the number of communicated items of data.
However, if it is assumed here that the time for calculation of an electron repulsion integral represented by primitive basis functions in a processor element is, for example, 2 microseconds (=10xe2x88x926 seconds. Hereinafter, microsecond is denoted by xe2x80x9cxcexcsxe2x80x9d); the average number of contractions of the contracted basis function is 1.5; and the density matrix elements and Fock matrix elements have a data size of a double precision floating point number, i.e., 64 bits, the time required for calculation of an electron repulsion integral represented by one contracted basis function is approximately 10 xcexcs. As a result, communication performance of 25.6 Mbps (25.6xc3x97106 bits per second) per processor element is required between the host computer and processor elements.
For example, when the number M of the processor elements is 100 in order to improve calculation performance, overall communication performance of 2560 Mbps is required. It is not easy to achieve such communication performance with the present techniques.
For example, an inexpensive communication device, e.g., serial communication according to IEEE 1394 bus standard specifications can achieve communication performance of 200 Mbps. However, when parallel calculation of Fock matrix elements is carried out according to the triple sort method using the same, the entire processing time is dominated by the communication time, which eliminates the effect of reducing calculation time utilizing symmetry of a matrix.
Third algorithm is a method referred to as xe2x80x9csimple blocked methodxe2x80x9d. This method can be further classified into a version based on the canonical ordering method and a version based on the triple sort method. The third algorithm is a method in which blocks of contracted basis functions are formed to improve utilization efficiency of density matrix elements and Fock matrix elements, thereby reducing the amount of communication.
This method will now be described based on the triple sort method.
First, N contracted basis functions are divided into n (=N÷IC) blocks each of which is therefore formed by IC functions. Let us assume here that the blocks are numbered by IB, JB, KB and LB. Then, four contracted basis function blocks IB, JB, KB and LB and six density matrix element blocks P(IB, JB), P(IB, KB), P(IB, LB), P(JB, KB), P(JB, LB) and P(KB, LB) that satisfy Expression 42 in FIG. 31 are passed to one processor element. The number of the passed density matrix elements is 6IC2.
The processor element calculates 3IC4 electron repulsion integrals which correspond to G(I, J, K, L), G(I, K, J, L) and G(I, L, J, K), and the processor element calculates Fock matrix element blocks F(IB, JB), F(IB, KB), F(IB, LB), F(JB, KB), F(JB, LB) and F(KB, LB) similarly to the above-described Expression 41 and returns the results to the host computer.
The number of the Fock matrix elements returned this time is also 6IC2. As a result, the number of communicated items of data is 12IC2÷3IC4=4/IC2[perERI]. That is, the utilization efficiency of the density matrix elements and Fock matrix elements becomes higher to reduce the amount of communication, the greater the number of contracted basis functions in a block.
In this case of the canonical ordering methods, it uses the expression 43 instead of the exparession 42.
A fourth method for reducing the amount of communication is a row-blocked method. This method is a version of the simple blocked method in which one processor element is assigned all of calculations of IB, JB, KB and LB where IB, JB, KB are the same combinations and only LB is different. FIG. 20 shows a flow chart of this method. The arrows in dotted lines in FIG. 20 indicate that the execution of the processes subsequent to the parts indicated by the arrows is not made wait by the preceding processes and that the processes wait for input of information from another processing system.
The row-blocked method can be also classified into a version based on the canonical ordering method and a version based on the triple sort method. The version based on the triple sort method will now be described with reference to the flow chart in FIG. 20.
First, similarly to the simple blocked method, N contracted basis functions are divided into n (=N÷IC) blocks each of which is therefore formed by IC functions (step S1) Next, the host processor determines a combination (IB, JB, KB) of contracted basis function blocks IB, JB, KB to be assigned to a particular processor element (step S2).
Next, the host computer transmits density matrix element blocks P(IB, JB), P(IB, KB) and P(JB, KB) each having ICxc3x97IC elements and rows of density matrix elements P(IB, L), P(JB, L) and P(KB, L) each having KBxc3x97ICxc3x97IC elements associated with three contracted basis function blocks IB, JB and KB that satisfy the above-described Expression 42 to the processor element (steps S3 and S4). L represents the entire range of KBxc3x97IC.
Upon receipt of the blocks at steps S11 and S12, the processor element activates an internal loop associated with L""s to calculate electron repulsion integrals and Fock matrix elements similarly to the simple blocked method (step S13). When the calculation is completed for all L""s, the processor element returns rows of Fock matrix element blocks F(IB, L), F(JB, L) and F(KB, L) each having KBxc3x97ICxc3x97IC elements and Fock matrix elements F(IB, JB), F(IB, KB) and F(JB, KB) each having ICxc3x97IC elements to the host computer (steps S14 and S15).
The host computer receives them at steps S5 and S6. The process then returns to step S2 from which the above-described process is repeated.
Such a configuration further improves the utilization efficiency of density matrix elements and Fock matrix elements arrangement and reduces the number of communicated item of data per electron repulsion integral to 2/NIC+2/IC2 [perERI] which is about one half the number of data in the case of the simple blocked method if it is assumed that N greater than  greater than IC.
When an update of combinations of IB, JB and KB on the processor element results in only a change associated with KB, the F(JB, LB) and F(KB, LB) similarly to the above-described Expression 41 and returns the results to the host computer, the matrix element blocks P(IB, JB) and F(IB, JB) and the row of matrix element blocks P(IB, L(, P(JB, L(, F(IB, L) and F(JB, L) can be left as they are on the processor element and can be reused. This further reduces the number of communicated items of data to 4/3NIC+2/3IC2 [perERI].
The article 3 has introduced a clustering method as a fifth method for further improving the utilization efficiency of density matrix elements and Fock matrix elements based on the same principle as that of the row-blocked method. However, this method will not be described here because it is regarded as an unpreferable method from the viewpoint of load distribution and scalability.
[Problems Associated with Cut-off]
Even the fourth algorithm which is the most preferable among the methods disclosed in the above-described article 3 has a problem when cut-off is taken into consideration. An example of such a problem will be described below. Let us assume here that xcex1 represents the ratio of survivors from a cut-off; M represents the number of processor elements; Teri(xcexcs) represents time required for calculation of one electron repulsion integral G(I, J, K, L); and the data length of a matrix element is 64 bits.
In the case of the canonical ordering method, the minimum amount of communication per job assigned to one processor element (i.e., the amount of communication in the case that a change occurs only in KB when combinations are updated) is:
2(2IC2+KBIC2)xc3x9764 (bits)
Since the number of electron repulsion integrals calculated in the processor element is xcex12KBIC4, required overall communication performance is as expressed by Expression 44 in FIG. 32.
In the case of the triple sort method, the minimum amount of communication per job assigned to one processor element (i.e., the amount of communication in the case that a change occurs only in KB when combinations are updated) is:
2(2IC2+KBIC2)xc3x9764 (bits)
Since the number of electron repulsion integrals calculated in the processor element is 3xcex12KBIC4, required overall communication performance is as expressed by Expression 45 in FIG. 32.
When the number M of processor elements for parallel processing is 100; time required for calculating an electron repulsion integral g(i, j, k, l) represented by primitive basis functions in a processor element is 2 xcexcs; the average number of contractions of contracted basis functions is 1.5 (therefore, the calculation time Teri per electron repulsion integral G(I, J, K, L) is 10 xcexcs); and the ratio xcex1 of survivors from a cut-off is 0.05, the dependence of communication performance required between the host computer and processor elements on the block sizes IC is calculated as shown in FIG. 21 in the case of the canonical ordering method and is calculated as shown in FIG. 22 in the case of the triple sort method.
While the required communication performance changes depending on the value of KB in both of the canonical method and triple sort method, the amount of change is small when KB greater than 100 and, for example, the block size IC may be set at 50 in the case of the canonical ordering method and at 30 in the case of the triple sort method depending on the actual communication performance.
While calculations can be carried out using such methods in a system utilizing many workstations having a sufficiently large data storage capacity which is assumed in the article 3, an enormous cost is required to configure such a system.
In a computing system utilizing inexpensive dedicated processor elements to which memories having a small capacity on the order of several tens Mbits are connected as disclosed in the article 2, the number of rows of matrix elements that can be stored is 10 or less, i.e., the permitted block size is about 2 or 3.
In this case, it is not possible to perform efficient processing whose speed is not determined by communication performance considering the fact apparent from FIG. 21 or 22 that the use of low-cost serial communication according to IEEE 1394 bus standard specifications as a communication device provides performance of only 200 Mbps.
The present invention has been made in view of the above circumstances and provides a method for parallel processing which allows efficient calculations of matrix elements even in parallel calculations with an inexpensive communication device and plural processor elements having a memory of a small capacity free of domination of the performance of communication between the host computer and the processor elements in terms of processing speed.
According to a first aspect of the present invention, there is provided a method for parallel processing utilizing a parallel processing apparatus having a host computer and at least one processor element to obtain all elements of a matrix F whose elements are a sum F(R, S)=F1(R, S)+F2(R, S) where F1(R, S) is a sum regarding all of variables T and U within a range from 1 to N (N is a positive integer) in product A1xc2x7P(T, U)xc2x7G(R, S, T, U) of functional values G(R, S, T, U) of a function G satisfying a relationship expressed using four integral type variables R, S, T and U within the same range from 1 to N (N is a positive integer) satisfying a relationship G(R, S, T, U)=G(R, S, U, T)=G(S, R, T, U)=G(S, R, U, T)=G(T, U, R, S)=G(T, U, S, R)=G(U, T, R, S)=G(U, T, S, R), elements P(T, U) of a matrix P satisfying a relationship P(T, U)=P(U, T) and a coefficient A1 and where F2(R, S) is a sum of all of the variables T and U within the range regarding product A2xc2x7P(T, U)xc2x7G(R, U, T, S) of the functional values G(R, U, T, S), matrix elements P(T, U) and coefficient A2. The method includes the steps of forming a triple loop associated with the variables R, S, T and U, making the outermost loop of the triple loop associated with combinations of the variables R and T satisfying relationships Rxe2x89xa6N and Txe2x89xa6R, making the second loop inside the outermost loop associated with the variable S while making the third loop inside the second loop associated with the variable U, or alternatively making the second loop associated with the variable U while making the third loop associated with the variable T, setting the value of the variable S within the range from 1 to R, setting the value of the variable U within the range from 1 to R, and calculating predetermined one among the functional values G(R, S, T, U) inside the third loop and calculating a predetermined part of the matrix elements F using the result of the calculation. The second and third loops are combined to form one job unit, and the plural processor elements performs processing on the basis of the job units.
According to a second aspect of the invention, there is provided a method for parallel processing in which molecular orbital calculation for calculating the energy of a molecule represented using N (N is a positive integer) contracted shells is carried out using a parallel processing apparatus having a host computer and at least one processor element. The method includes the step of obtaining all matrix elements F(I, J) of a Fock matrix F represented ba sum regarding all primitive basis functions included in contracted basis functions I and J one of whose elements is primitive basis functions i and j respectively included in a sum f(I, J)=f1(I, J)+f2(I, J) where f1(I, J) is a sum regarding all of contracted basis functions in product A1xc2x7P(K, L)xc2x7g(i, j, k, l) of functional values g(i, j, k, l) of an electron repulsion integral function g represented using primitive basis functions i, j, k and l which are components of respective primitive shells r, s, t and u included in respective contracted shells R, S, T and U as indices, elements P(K, L) of a density matrix P represented using a contracted basis function K one of whose component is the primitive basis function k and a contracted basis function L one of whose component is the primitive basis function l as indices and a coefficient A1 and where f2(I, J) is a sum regarding all contracted basis functions in product A2xc2x7P(K, L)xc2x7g(i, k, j, l) of functional values g(i, j, k, l) of the electron repulsion integral function, the elements P(K, L) of the density matrix P and a coefficient A2. The calculation includes the steps of forming an outermost loop which is a loop associated with combinations of the contracted shells R and T satisfying relationships Rxe2x89xa6N and Txe2x89xa6R, forming a second loop inside the outermost loop as a loop associated with the contracted shell S and a third loop inside the second loop as a loop associated with the contracted shell U or alternatively forming the second loop as a loop associated with the contracted shell U and the third loop as a loop associated with the contracted shell T, setting the value of the contracted shell S within the range from 1 to R, setting the value of the contracted shell U within the range from 1 to R, and calculating predetermined one among the electron repulsion integrals inside the third loop and calculating a predetermined part of the Fock matrix elements using the result of the calculation. The second and third loops are combined to form one job unit, and the plural processor elements are assigned processes which are in the job units.