This invention relates to a simulation parameter determination method, in particular, relates to a method of determining parameters for predicting the experimental value by simulation calculation.
With recent advances in computer technology, computer simulation has increasingly been utilized in many disciplines, particularly in fields of science. Simulation of molecular level chemical reactions occurring in the body is one example thereof. For instance, computer simulation of binding reactions between proteins and low molecular weight compounds is commonly performed in drug discovery research. Because water occupies about 60% of body composition, in binding reactions of molecules in the body, in addition to the molecules directly participating in binding, a water solvent present around the molecules largely contributes to binding energy. Therefore, a highly precise calculation of the binding energy in a water solvent is required for accurate computer simulation of binding reactions of molecules in the body. Where computer simulation is utilized in drug discovery research, it is desired that calculation accuracy for the binding energy of a water solvent be such that the calculated value relative to the experimental value is within 1.4 kcal/mol. This value corresponds to 10 times the error in binding energy that is typical in experimental measurement results of binding between proteins and low molecular weight compounds.
To obtain a highly precise calculation of solvation energy, which is defined as the difference in energy before and after the introduction of the effects of water, the use of a calculation model that provides high theoretical exactitude is necessary. Explanation will be given herein describing a polarizable continuum model based on the first principle quantum chemistry calculation, which is one example of a model that provides high theoretical exactitude, and the parameters introduced to the polarizable continuum model, with reference to FIG. 1. This model explicitly accounts for the atomic nucleus and electrons of the calculation object molecule 101 but addresses the water solvent 102 as dielectrics including the calculation object molecule 101. In addition, as shown by an outline zone in the drawing, the model requires a definition of a boundary surface 104 between the space 103 occupied by a molecule and another region, the space occupied by dielectrics. This boundary surface can be defined as the surface formed by overlap of a sphere 105 that is arranged on approximately all the atoms constituting the molecule. No arrangement of a sphere on a hydrogen atom, however, may be selected. Solvation energy can be calculated based on the charge present on the boundary surface as a result of interaction between the molecule and the water solvent. Therefore, the definition of the boundary surface has a direct influence on accuracy in calculation. Accordingly, a determination of the value of the radius 106 of a sphere arranged on an atom, as a parameter specifying the boundary surface between a solute and a solvent, is capable of providing for a highly accurate calculation of solvation energy. Hereafter, this radius will be referred to as “an atomic parameter”.
As one example of an atomic parameter determination method, a method that uses a gradient matrix is provided. Explanation will be herein on an atomic parameter determination procedure that uses a gradient matrix with reference to FIG. 2 and FIG. 3.
FIG. 2 is a flowchart of the atomic parameter determination method that uses a gradient matrix. Initially, an energy convergence threshold τ2011, an the experimental values 2012 of three-dimensional coordinate information and the solvation energy of a molecule, and an atomic type 2013, which is a classification based on the chemical similarity of each atom constituting the molecule, are input from an input apparatus 201. Subsequently, in an atomic parameter initialization step 202, values of atomic parameters are initialized according to each atomic type. Then, in the matrix generation step 203, a matrix that stores the solvation energies of all molecules and a solvation energy gradient of all molecules relative to the change in each atomic parameter, is generated. Next, an updated value vector x is determined in a matrix-equation solving step 204 so as to minimize the objective function (expression 1), in which A represents a matrix generated; x represents the updated value vector of the atomic parameter; and b represents an the experimental value vector of solvation energy.g=Σ|Ax−b|2  (Expression 1)Next, in an atomic parameters update step 205, the atomic parameter value is updated based on the update value vector. Next, in an atomic parameter calculation termination judging step 206, a determination of whether the amount of change in a residual sum of squares of solvation energy is below τ is made. In the case where the amount of change is equal to or larger than τ, the matrix generation step 203, the matrix-equation solving step 204 and the atomic parameters update step 205 are repeated. In the case where the amount of change of the atomic parameter is below τ, the converged atomic parameter value 2071 is output by the output apparatus at step 207.
FIG. 3 illustrates the details of the processes executed in the matrix-generating step, the matrix-equation solving step, and the atomic parameters update step of the atomic parameter determination method that uses a gradient matrix described with reference to FIG. 2. The processes 302 that are executed in the matrix generation step 301 involve calculation of the solvation energies 3021 of all the molecules and calculation of the solvation energy gradient 3022 of all analysis objects relative to the change in each atomic parameter. The calculation results are stored in the matrix 3023. The actual calculation that is performed for each matrix element 3024, shown by open circle marks in the matrix, involves a one-time-evaluation of solvation energy using the polarizable continuum model. The process 304 that is executed in the matrix-equation solving step 303 is a minimization calculation of an objective function g. Objective function g can be defined by a matrix-equation involving the matrix 3041 in which the number of lines generated in the matrix generation step 301 is the number of molecules and the number of columns is the total number of atomic types plus one; the solution vector 3042 having a number of elements equal to the number of atomic types plus one; and where the solvation energy the experimental value vector 3043 ahs the same number of elements as the number of molecules. The matrix 3041 and the solvation energy the experimental value vector 3043 are known, and the solution vector 3042 is an unknown amount that is the object to be solved. The process 306 to be executed in the atomic parameters update step 305 is to update the new atomic parameter vector 3061 by summing the atomic parameter vector 3062 before update and a vector 3063 corresponds to the solution vector 3042 obtained in the matrix equation solving step 303 in which the top element has been deleted. Each of the elements of the vector 3063 represents the amount of change in the atomic parameter value in the next interation. Because updating the atomic parameter value results in a change in the solvation energy or the solvation energy gradient relative to the change in the atomic parameter, the calculations in FIG. 3 are repeated from the matrix generation step to the atomic parameters update step until g<τ is satisfied.
This atomic parameter determination method, however, has two problems.
The first problem is the significant amount of calculation required in the matrix generation step 301 for calculating the solvation energies 3021 of all molecules and for calculating the solvation energy gradients 3022 of all molecules relative to the change in each atomic parameter. This is because molecular integration is calculated in the first principle quantum chemistry calculation, which increases the amount of calculation to the order of the mth-power of atomic number (for example, m=3, 4, 5, - - - , depending on calculation methods) for expressing inter-electron interactions. When the amount of calculation time required for calculating the solvation energy of one molecule is 1, the amount of calculation time required for calculation of the solvation energy gradient of one molecule using a finite difference method is also 1; therefore, the total amount of calculation time required for matrix generation is directly proportional to the number of the matrix elements 3024 shown by open circle marks in the matrix 3023. Because the matrix generation step is repeated until convergence of solvation energy, the total amount of calculation time that is required for parameter determination is directly proportional to the number of matrix elements multiplied by the total number of iterations executed.
More generally, in the expressions provided below, N represents the number of molecules used in atomic parameter determination, A(n) represents the number of atoms included in the molecule n, and f(A(n)) represents the amount of calculation time required for solvation energy calculation of the molecule n. As described above, f(A(n)) is a power function of A(n). In addition, p(n) represents the number of atomic types included in the molecule n; and T represents the amount of calculation time.
In prior art methods that use a gradient matrix, the calculations are repeated until the amount of change in the atomic parameter value is below the energy convergence threshold. When the number of iterations executed is expressed by I, then the amount of calculation time is expressed by expression 2:
                                                                        T                conventional                            =                              I                ⁢                                                      ∑                                          n                      =                      1                                        N                                    ⁢                                                            (                                              1                        +                                                  p                          ⁡                                                      (                            n                            )                                                                                              )                                        ⁢                                          f                      ⁡                                              (                                                  A                          ⁡                                                      (                            n                            )                                                                          )                                                                                                                                                                    =                                                ∑                                      n                    =                    1                                    N                                ⁢                                                      (                                          I                      +                                              I                        ·                                                  p                          ⁡                                                      (                            n                            )                                                                                                                )                                    ⁢                                      f                    ⁡                                          (                                              A                        ⁡                                                  (                          n                          )                                                                    )                                                                                                                              (                  Expression          ⁢                                          ⁢          2                )            
In expression 2, the subsequent term after the summation symbol at the right-hand side of the first line is the amount of calculation time required for matrix generation, the “1” in the summation symbol corresponds to solvation energy calculation time, and p(n) corresponds to the gradient calculation of solvation energy relative to the change in the atomic parameter value.
Supplementary explanation will be provided on the first problem discussed above using particular examples. FIG. 4 is an example of application of the atomic parameter determination method that uses a gradient matrix for 10 molecules. In FIG. 4, an atomic type 401, a resultant atomic parameter value 402, and an atomic type 403 for each molecule contains are shown. For the atomic type 401, the following 6 types were defined: CH3(C), CH3(N), CH2(CC), CH2(CN), NH2(C), and NH2(N). The atom(s) shown in the parenthesis are bonding partner atom(s). For example, the atomic type CH2(CN) represents a carbon atom of CH2 having a bond with one carbon atom and one nitrogen atom. The atomic type(s) 403 corresponding to each molecule are shown in FIG. 4 as open circle marks. For example, for CH3—CH2—NH2 (the sixth molecule), CH3 belongs to the CH3(C) atomic type, because it bonds to CH2; CH2 belongs to the CH2(CN) atomic type, because it bonds to CH3 and NH2; and NH2 belongs to the NH2(C) atomic type, because it bonds to CH2. As a result, three open circle marks are provided for this molecule. As an exception in the present atomic parameter determination method, for the non-methyl carbon atom of (CH3)4—C (the ninth molecule), a fixed value of 1.7 is used. This is because this atom is surrounded by 4 methyl groups and, as a result, is difficult to contact with water molecules. Therefore it is believed that the contribution of the water molecules to solvation energy for this atom is small.
For initial values of atomic parameters, 1.7 was used for 4 kinds of CH3(C), CH3(N), CH2(CC) and CH2(CN), and 1.4 for 2 kinds of NH2(C) and NH2(N). In an example implementation used to calculate the atomic parameters, a solution vector was approximately determined every iteration time using a steepest gradient method to be described below to update atomic parameters. In addition, to accelerate atomic parameter convergence, atomic parameters were updated using the solutions from previous iterations once every four iterations. Details of this example implementation will now be provided below.
1) Update Method Executed for Every Iteration
1-1) For each atom “a” that is not equivalent in view of symmetry composing molecule n, a Taylor expansion of solvation energy En is terminated at the first order term, and based on the resultant equation and the energy residue Zn, the atomic parameter, rn,a,t, and a weight thereof, Wn,a,t, are calculated according to Expression 3 and Expression 4 as provided below.
                              r                      n            ,            a            ,            t                          =                              r                          n              ,              a              ,                              t                ⁡                                  (                  old                  )                                                              -                                    z              n                                                                        ∂                                      E                    n                                                                    ∂                                      r                                          n                      ,                      a                      ,                                                                                  ·                              w                                  n                  ,                  a                  ,                  t                                                                                        (                  Expression          ⁢                                          ⁢          3                )                                          w                      n            ,            a            ,            t                          =                                            (                                                ∂                                      E                    n                                                                    ∂                                      r                                          n                      ,                      a                      ,                      t                                                                                  )                        2                                              ∑              t                        ⁢                                          (                                                      ∂                                          E                      n                                                                            ∂                                          r                                              n                        ,                        a                        ,                        t                                                                                            )                            2                                                          (                  Expression          ⁢                                          ⁢          4                )            1-2) The updated value rt of the atomic parameter is calculated as the weighted average of rn,a,t obtained for each molecule n.
                              r          t                =                                            ∑                              n                ,                a                                      ⁢                                          r                                  n                  ,                  a                  ,                  t                                            ·                              w                                  n                  ,                  a                  ,                  t                                                                                        ∑                              n                ,                a                                      ⁢                          w                              n                ,                a                ,                t                                                                        (                  Expression          ⁢                                          ⁢          5                )            2) The Update Method Executed Once Every Four Iterations2-1) For each atom “a” that is not equivalent in view of symmetry composing molecule n, a straight line zn=A rn,a,t+B, which approximates hysteresis of the atomic parameters rn,a,t(I), and a residue zn,(I) at I=4i−3, 4i−2, 4i−1, and 4i, are determined for calculating the atomic parameter value in which the residue becomes 0 as rn,a,t=−B/A, assuming that I is the number of iterations executed and I is an integer.2-2) As for each atomic type t, an updated value rt of the atomic parameter is calculated, as provided below in Expression 6, as the average of the resultant rn,a,t. The sum of the delta function of the denominator is the number of atoms belonging to atomic type t of the atoms that are not equivalent in view of symmetry composing molecule n.
                              r          t                =                                            ∑                              n                ,                a                                      ⁢                          r                              n                ,                a                ,                t                                                                        ∑                              n                ,                a                                      ⁢                          δ                                                t                                      n                    ,                    a                                                  ,                t                                                                        (                  Expression          ⁢                                          ⁢          6                )            
FIG. 5 shows the relationship between the amount of calculation time and calculation error obtained by the application of the atomic parameter determination method using the above-explained gradient matrix. In this drawing, the abscissa axis is the amount of calculation time, defined as the calculation time 501 required for atomic parameter determination using one computer, and the longitudinal axis is the calculation error 502, defined as the average value of the absolute calculation errors of each of the molecules. The lozenge 503 of the plotted marks shows the relationship between the amount of calculation time and calculation error. Because this method repeats execution of a matrix-equation, the calculation error becomes smaller as the number of iterations increases. According to the data of the example in FIG. 5, to attain a calculation error of 0.1 kcal/mol, a calculation time of about 500 minutes is required. As shown in this example, the amount of calculation time required is vast, even in atomic parameter determination for a relatively small number of molecules.
The second problem encountered with the atomic parameter method described above is that calculation accuracy is unpredictable when the resultant atomic parameter is applied to predict solvation energy. In general, calculation accuracy is considered to depend on definition of an atomic type. Therefore, in the case where the calculation accuracy is below a desired accuracy, re-definition of the atomic type, and re-execution of the atomic parameter determination procedure are required. In this case, iteration of entire atomic parameter determination procedure itself is required, which further expands the amount of calculation time.
This atomic parameter determination method described above can be generalized using the following substitutions: substituting “analysis objects” for molecules; substituting “elements composing analysis objects” for atoms composing molecules; substituting “elemental parameters” for atomic parameters; substituting “the experimental values” for the experimental values of solvation energy; and substituting “calculated values” for calculated values of solvation energy.
FIG. 6 is a flowchart of the parameter determination method using a gradient matrix. Initially, the convergence threshold τ6011, the necessary information and the experimental values 6012 for calculation of the analysis objects, and the elemental type 6013 for each element composing each analysis object are input by the input apparatus 601. Subsequently, in the elemental parameter initialization step 602, the elemental parameters are set according to elemental type. Then, in the matrix generation step 603, a matrix, which stores the calculated value of all analysis objects and the gradient of the calculated value of all analysis objects relative to change in each elemental parameter, is generated. Then in the matrix-equation solving step 604, such an updated value vector x, the objective function g=Σ|Ax−b|2 is solved for an updated value vector x that minimizes the objective function, where that A represents a matrix generated; x represents the updated elemental parameter value vector; and b represents the experimental value vector. Next, in the elemental parameters update step 605, the values of the elemental parameters are updated based on the update value vector x. Next, in the elemental parameter calculation termination judging step 606, a determination of whether the amount of change in the residual sum of squares is below τ is made. In the case where the amount of change is equal to or larger than τ, the matrix generation step 603, the matrix-equation solving step 604 and the elemental parameters update step 605 are repeated. In the case where the amount of change of the elemental parameter is below τ, the converged elemental parameter 6071 is output by the output apparatus 607.
FIG. 7 shows particular details of the processes executed the matrix generation step, the matrix-equation solving step and the elemental parameters update step in the parameter determination method using a gradient matrix. The process 702 to be executed in the matrix generation step 701 is the calculation of the calculated value 7021 of all analysis objects and the gradients 7022 of the calculated values of all analysis objects relative to change in each elemental parameter. The calculation results are stored in the matrix 7023. The amount of calculation time required for each of the matrix elements 7024, shown by open circle marks in the matrix, corresponds to a one-time evaluation of the calculated values. The process 704 to be executed in the matrix-equation solving step 703 is the calculation time for minimizing the objective function g, which can be defined by a matrix-equation composed of the matrix 7041, which has the number of analysis objects as the number of rows and the number of elemental types plus 1 as the number of columns, generated in the matrix generation step 701 and the solution vector 7042, which has the number of elemental types plus 1 as the number of elements, and the experimental value vector 7043, which has the number of analysis objects as the number of elements. The matrix 7041 and the experimental value vector 7043 are known, and the solution vector 7042 is an unknown object having values to be solved. While a plurality of methods are present for solving the matrix-equation, because these methods are premised on the step of determining the calculation rate being the matrix generation step and not the matrix-equation solving step, therefore, the use of a particular method for solving may be arbitrary here. The process 706 to be executed in the elemental parameters update step 705 is to obtained the updated elemental parameter vector 7061 according to the sum of the elemental parameter vector 7062 prior to the and the vector 7063, which corresponds to the solution vector 7042 obtained in the matrix-equation solving step 703 with the top element having been deleted. Each element of the vector 7063 represents the amount of change in the next iteration of elemental parameter value calculation. When the elemental parameter vector is updated, the calculation values or the gradient of the calculation values relative to elemental parameters will change, and, therefore, the calculations executed in the matrix generation step are again repeated.
In the generalized parameter determination method, where the matrix element calculations occupy a large portion of the total calculation time, a problem arises that the amount of calculation time required for repeatedly solving the matrix-equation becomes vast. The amount of calculation time is given by expression 2. However, the function form of f(A(n)) differs depending on the analysis objects. In addition, in the case where the parameter determination result does not satisfy a desired accuracy provision, re-definition of the parameters and iteration of the entire parameter determination procedure itself are required, which further expands the amount of calculation time required.
To solve the above first problem, it is an object of this invention to reduce the amount of calculation time required for parameter determination.
In addition, it is a second object of this invention to reduce the amount of calculation time required for obtaining a parameter determination result that satisfies a desired accuracy provision.