Molecular dynamics (MD) simulations have been successfully applied over the past 15 years to study the structure, dynamics, and thermodynamics of complex molecular systems consisting of many thousands of atoms, Typical applications include: docking, diffusion, reaction pathway, phase transition, ant protein folding studies. Researchers in the pharmaceutical, polymer, and chemical industries are beginning to routinely use these techniques to design new drugs or materials.
User acceptance of these tools is based on the development of powerful computers and highly optimized commercially available dynamics software that can efficiently carry out the simulation of large solvated molecular systems. The traditional approaches to molecular dynamics are based on molecular mechanics or "force field" methods that use empirical approximations to compute the potential energy as a function of the Cartesian coordinates of the atoms of a molecule. An atomistic equation of motion (EOM) is derived for the molecule from Newton's third law, where the independent degrees of freedom (DOF) are defined by Cartesian coordinates. The forcing functions for the molecule are defined by gradients of the assumed potential energy terms that represent strain in internal coordinates (e.g., bonds, angles, and dihedral angles), as well nonbonded terms (e.g., hydrogen bonds, Van der Waal interactions and charge interactions).
The EOM has the form: EQU m.sub.i R.sub.j =F.sub.j (R.sub.1, . . . , R.sub.N);j=1, . . . , N.sub.P
where m.sub.j denotes the mass of the j-th atom, R.sub.s =(x.sub.s,y.sub.s, z.sub.s) denotes the positions vector for the s-th atom, N.sub.p denotes the number of atoms, and F.sub.j (*) denotes the position dependent sum of all the forces on the j-th particle. For small molecules, these equations are easily set up and solved. There are no approximations introduced other than those involved in setting up the empirical expressions and the parameter values used in defining the force field.
There are two sources of computational complexity for MD simulations. First, the energy function requires N.sup.2 nonbonded interactions, where N is the number of atoms. This problem arises because every atom is affected by every other atom by way of long range interaction effects. For many applications the computation of the nonbonded interactions represent .gtoreq.90% of the computer time.
Second, small integration step sizes must be used (i.e., 0.5 femptoseconds (fs)) in the integration of the EOM. The limitation in the size of the step size arises because of the following inequality: ##EQU1## where P.sub.min(v) denotes the period of the highest frequency response present in the system behavior and v denotes the wave-number. The high frequency motion must be eliminated in order to take larger step sizes.
In constrained dynamics approaches, constraints can be used to reduce the high frequency motion. Such constraints can be introduced into the atomic EOM by adding a force-like term consisting of the gradient of a position-dependent constraint equation times an unknown Lagrange Multiplier (LM), as follows: ##EQU2## subject to ##EQU3## where .gradient..sub.i denotes the gradient with respect to the j-th atom position coordinates, C.sub.r (*) denotes the functional form the r-th constraint function, and .lambda..sub.r denotes the r-th LM. The solution for the LM is obtained by differentiating the gradient of the constraint equation and introducing the EOM. Straightforward manipulations lead to a LM solution defined by a linear matrix-vector algebraic equation of dimension N.sub.c .times.N.sub.c, where N.sub.c denotes the number of constraints. As N.sub.c becomes large, the computational burden associated with the inversion matrix becomes prohibitive (e.g., O(N.sup.3)).
There are two alternative classes of methods available for solving this problem. The first class of methods involves the construction of a set of generalized coordinates EQU [M(q)]q=F(q,q)
where [M(q)] denotes the N.times.N mass matrix, q denotes the generalized coordinate vector, and F(*) denotes the force vector. This approach eliminates the constraint variables completely in the problem formulation. The generalized coordinate method is completely general. However, it requires the inversion of a N.times.N matrix [M(q)], which is an O(N.sup.3) algorithm; that is, the inversion computation scales by the cube of the number of constraints imposed on the system. In addition, the equations are very cumbersome to implement and it is difficult to modify the equations once they are formulated.
Generalized coordinates have been used in MD for many years, though the applications have tended to be limited to short polymers (Pear et al. (1979) J Chem Phys 71:212; and Katz et al. (1979) Comput Chem 3:25). General-purpose formulations have not appeared previously for several reasons. First, classical analytical dynamics method, such as Lagrange's, involve the formulation of kinetic and potential energy expressions that require first and second partial derivatives with respect to the generalized coordinates (Ryckaert et al. (1985) Mol Phys 55:549). This procedure is well defined but rapidly becomes unwieldy even for relatively low order problems (i.e., .gtoreq.30 independent variables).
The second class of methods involves calculating approximate solutions to the constrained EOM. These methods use an iterative approach to solve for the Lagrange multipliers and, typically, only need a few iterations if the corrections required are small. The most popular method of this type, SHAKE (Ryckaert et al. (1977) J Comput Phys 23:327; and Van Gunsteren et al. (1977) Mol Phys 34:1311) easy to implement and scales as O(N) as the number of constraints increases. Therefore, the method is applicable to macromolecules. An alternative method, RATTLE (Anderson (1983) J Comput Phys 52:24) is based on the velocity version of the Verlet algorithm. Like SHAKE, RATTLE is an iterative algorithm.
Both of these approximate methods offer the ability to easily add bond constraints to a flexible system and allow a step size as large as 3-4 rs. However, adding any other types of constraints, even bond angle constraints, can greatly slow the convergence of SHAKE and limit the maximum step size.
Recent work has resulted in extensions of the SHAKE algorithm that allow for arbitrary geometrical constraints. The methods no longer have the convergence problems associated with the original SHAKE algorithm. These methods are also iterative and straightforward to implement. However, the maximum step size used appears to still be limited to about 3 rs. The systems used to test the methods were small (&lt;50 atoms), and it is not clear how the methods would perform for larger systems with many thousands of constraints (Tobias et al. (1988) J Chem Phys 89:5115).
The shortcomings of these iterative methods are that they are not exact and that they are still limited to a relatively small step size. Even with the availability of supercomputers and highly optimized molecular dynamics packages, researchers are only able to observe relatively small atomic fluctuations that occur on the picosecond to nanosecond time scale. Longer time scale motion, such as helix motion, domain, and subunit motion, cannot yet be thoroughly looked at in detail with all atom simulations.
Most of the detailed studies of large molecule systems have involved approximations of the motion by using rigid bodies. Though it has been shown that these rigid body motions can sometimes dominate the atomic fluctuations, it cannot be known a priori that these approximations are valid. If the rigid groups are picked incorrectly the calculated motion will not represent reality. Even if the rigid groups are picked correctly, the barriers for motion between the groups will be artificially high because the atoms cannot make small displacements that will relieve much of the nonbonded repulsion.
To study large systems these motions, as well as even larger time scale motions, such as folding and unfolding transitions, a new class of dynamics algorithms is required that will reduce the degrees of freedom in a complex molecule system down to a manageable size, as well as scale up linearly for a complex system so that protein conformational changes can be studied at any desired level of detail.
Accordingly, is an object of the present invention to provide an improved method for simulating the dynamics of molecules.