General Background
In the fields of biomedicine and chemistry, many useful molecules gain their utility from their ability to tightly bind specific other molecules, for example:                1 Pharmaceutical Industry                    1.1 Penicillin is a molecule that kills harmful bacteria by binding and inhibiting proteins which the bacteria require for survival. The proteins are enzymes which normally would construct the bacterial cell wall by chemically linking small chemical units into larger ones that form the cell wall.            1.2 One of the most important treatments for HIV/AIDS is a protease inhibitor, a molecule that specifically binds enzymes belonging to the AIDS virus and prevents them from carrying out their normal function. See, for example, U.S. Pat. No. 6,071,916.            1.3 Among the most valuable and widely used medications today are molecules that prevent heart attacks by preventing the human body from synthesizing cholesterol. See, for example, U.S. Pat. No. 6,180,660. These molecules include Pfizer's drug atorvastatin (Lipitor®) and Merck's drug simvastatin (Zocor®). These molecules bind and inhibit a protein known as HMG-CoA reductase. This protein is an enzyme that is involved in the synthesis of cholesterol.            1.4 Sometimes a molecule with the potential to be used as a valuable medication, such as one of the examples above, is chemically unstable or has other problems that make it difficult to prepare, store or administer. One way to solve these problems is to mix the drug molecule with another molecule, known as a carrier, which binds it and thereby improves its properties as a therapeutic agent. See, for example, U.S. Pat. No. 4,596,795.                        2 Chemical Industry                    2.1 Much work in the chemical industries involves converting one type of molecule to another type of molecule. For example, huge chemical plants convert petroleum into plastics, pigments, detergents, etc. Forcing these conversions to occur often requires very hot temperatures, strong acids, etc. However, a few conversions can be carried out at room temperature without harsh chemicals by using enzymes, which work by binding the molecules to be converted and thereby catalyzing their chemical conversion. Such “green chemistry” saves energy and produces less pollution. Enzymes that bind and act upon specific targeted molecules also are used in the production of food ingredients, such as high-fructose corn syrup. See, for example, U.S. Pat. No. 4,077,842 and references therein.            2.2 The first steps of chemical (and pharmaceutical) production almost always yield a mixture of molecules that include the specific molecule of interest. A widely used technique for purifying the desired molecule out of this mixture is to flow the mixture over a material that has molecular parts which bind the molecule of interest and hold it back, while the undesired molecules continue to flow past.                        3 Nanotechnology.                    Extremely small objects and devices hold great promise in applications such as medicine, measurement and detection, and computers, but building such small devices is very difficult. One approach under investigation is to coat two parts that need to be assembled together with molecules that bind each other. Then simply mixing the parts will lead to assembly due to the mutual attraction of the molecules with which they are coated. See, for example, U.S. Pat. No. 6,962,747.                        
The binding of a small molecule (typically molecular weight less than 1000 Daltons) to a macromolecule like a protein is of central importance in many of these practical applications, and many other useful applications involving binding interactions between a small molecule and a macromolecule could be mentioned. However, it is known to those skilled in the art that discovering or inventing a small molecule that binds a targeted protein, or alternatively modifying a protein so that it binds a targeted small molecule, are difficult challenges. Success in such endeavors is not achieved by a reliable, stepwise process of molecular design, but instead requires a large component of trial and error.
Over recent decades, computer models have begun to play a role in the design of molecules intended to bind a specific protein and thus to be useful as medications, as in examples 1.1-1.3 above. Such computer models aim to predict whether a given molecule will bind the targeted protein and, if so, how tightly it will bind. If such a model were accurate, then it could be used to test a proposed molecule computationally, rather than experimentally, and thereby save the time and money required to procure or synthesize various molecules and test them experimentally.
Unfortunately, existing computer models suffer from one or both of two major problems. First, these models are notoriously unreliable and inaccurate. See, for example Warren et al, A Critical Assessment of Docking Programs and Scoring Functions, J Med Chem, 2006, 49, 5912-5931. As a consequence, they do not avoid the requirement for extensive experimental work in order to arrive at a molecule that binds the targeted protein tightly enough to have potential as a medication. Second, those computer models which are considered to be most accurate are so complex and demanding computationally that they require weeks or more of computer time and are therefore impractical for most real-world projects. For example, a recent research dissertation reported using 20 fast computers for a week to make a prediction for a single molecule and protein (http://www.columbia.edu/˜mrs2144/MRSdissert.pdf).
The current state of the art can be explained by scenarios common in early-stage drug discovery. In one important early stage of drug-discovery, the research team knows that a specific protein plays a role in a disease, and the team seeks a molecule that will bind and inhibit the protein's function in order to treat the disease. A widely used initial approach to discover such a molecule is to employ a computer modeling technique in order to computationally screen a long list of available molecules that could be purchased and studied. This computational screening process often begins with the detailed three-dimensional structure of the targeted protein. This structure may have been obtained by X-ray crystallography or other experimental or computational methods. A computer model is used to spatially fit each candidate molecule into a binding site of the targeted protein, and then to use the fitted structure to assess the likelihood that each molecule will in fact bind the protein. The molecules are then ranked according to the likelihood that they will bind the targeted protein, and molecules at the top of the list are procured and tested experimentally. The list of available molecules may stretch into the millions, so a fast computer model is required. However, it is known that the computer models used at this stage of the project are not accurate enough to ensure that the top-ranked molecules will in reality bind the targeted protein, so many of the experimental tests do not yield positive results. Conversely, existing computer models often assign poor ranks to molecules which in fact bind the target tightly, and thus can lead the research team to miss molecules that would have been useful for the project. These considerations highlight the need for a more accurate computer model that can rapidly predict whether an available compound will bind a targeted protein.
In another important stage of drug-discovery, the research team has already identified a molecule, known as a lead compound, that binds the targeted protein, but not as tightly as is required for a drug. This lead compound may have been obtained by the screening process described in the previous paragraph, or by some other method. The next stage of the project, called lead optimization, constitutes an effort to devise a chemical variant of the lead compound that binds the targeted protein with high enough affinity to be potentially useful as a drug. Chemical variants of the lead compound are proposed, custom-synthesized by medicinal chemists, and tested for affinity to the protein. This stage is time-consuming and costly because of the challenge of chemically synthesizing each variant of the lead compound. The time required to work through this stage is especially costly because delays postpone the date at which the company will see revenues from the new drug, as well as the date when patients suffering from the disease will be able to use the new therapy. In addition, drug companies compete with each other and there is great value in arriving at a new medication before the competition. Unfortunately, even the most highly regarded computer models rarely provide medicinal chemists with accurate and timely guidance regarding what variants of the lead compound will bind with highest affinity, as is known to those skilled in the art.
These scenarios of early stage drug-discovery highlight the critical need for a more accurate computer model to predict how tightly a proposed variant of a lead compound will bind a targeted protein. Such a model will reduce costs and thus speed the discovery of new drugs important to the pharmaceutical industry and to people requiring treatment.
As discussed above, there are many other uses for a computer model of binding in addition to drug-discovery. The shortcomings of existing computer models, just discussed in the context of drug-discovery, have equally negative consequences in these other areas of application. For example, the inaccuracy of computer models of binding makes it difficult and expensive to modify the binding site of an existing enzyme so that it will bind and act upon a different chemical substrate. This makes it difficult to arrive at enzymes useful to the chemical and food industries.
Technical Background
The binding affinity of two molecules, receptor R and ligand L forming complex RL, is specified by a binding constant Kb, or a dissociation constant Kd, defined by:
                              K          b                =                              K            d                          -              1                                ≡                                    (                                                                    C                    RL                                    ⁢                                      C                    o                                                                                        C                    R                                    ⁢                                      C                    L                                                              )                        equilibrium                                              Equation        ⁢                                  ⁢        1            
where CR, CL and CRL are the concentrations of molecule R, molecule L and their complex RL, respectively, Co is the standard concentration (typically 1 mole per liter (M)) and the quantity in parentheses is the ratio of concentrations at equilibrium, typically in units of moles per liter (M). The standard binding free energy ΔGo can then be written asΔGo≡−RgT ln(Kb)  Equation 2
where Rg is the gas constant and T is absolute temperature. The more negative this quantity, the greater the binding affinity of molecules R and L for each other. The central challenge in the present field is to compute the standard free energy of two specified molecules for each other.
In order to understand the principles underlying any method of computing ΔGo, it is important to be aware that small molecules and proteins are all flexible to some degree at least, so they can adopt various three dimensional configurations or conformations. Their motions are governed by various interatomic forces, such as the forces of covalent bonding, van der Waals interactions, electrostatic interactions, hydrogen bonds, etc. If a molecule is immersed in a solvent, such as water or chloroform, then the solvent exerts additional forces on the molecule. For example, water tends to pull apart charged atoms, and drive together nonpolar atoms. The various forces that act upon a molecule determine what conformations it tends to adopt. More specifically, statistical thermodynamics states that the forces experienced by a molecule tend to drive it into conformations of low energy, where the energy is defined as the molecule's own potential energy U plus an additional solvation energy contribution W from the solvent in which it is immersed. Similarly, molecules tend to bind each other if they can form a complex of low energy, relative to the energies they possess when they are not bound to each other.
Mathematically, the conformation of a molecule or complex can be specified by a set of atomic coordinates symbolized as a vector or a sequence of vectors r. By way of non-limiting example, when Cartesian coordinates are used, r=(xk, yk, zk) where k≦3Natoms−6 and Natoms is the number of atoms in the molecule. A sequence of conformations i may be represented by (ri), where each ri is a conformation. Potential and solvation energies can then be written as functions of r, respectively U(r) and W(r). A molecule or a complex is stable to the extent that it has many conformations where U(r)+W(r) is low. The standard free energy of binding can be written as
                              Δ          ⁢                                          ⁢                      G            o                          =                              -                          R              g                                ⁢          T          ⁢                                          ⁢                      ln            ⁡                          (                                                                    C                    o                                                        8                    ⁢                                                                                  ⁢                                          π                      2                                                                      ⁢                                                      Z                    RL                                                                              Z                      R                                        ⁢                                          Z                      L                                                                                  )                                                          Equation        ⁢                                  ⁢        3            
where Co is the standard concentration (normally 1 M) and the quantities Z are configuration integrals for the subscripted complex RL and the separate molecules R and L:
                    Z        ≡                              ∫                          -              ∞                        ∞                    ⁢                                    ⅇ                                                -                                      (                                                                  U                        ⁡                                                  (                          r                          )                                                                    +                                              W                        ⁡                                                  (                          r                          )                                                                                      )                                                  ⁢                                  R                  g                                ⁢                T                                      ⁢                                                  ⁢                          ⅆ              r                                                          Equation        ⁢                                  ⁢        4            
These formulae indicate that the more low-energy conformations the complex RL can adopt relative to the unbound molecules R and L, the greater will be their binding affinity. The equations presented above provide the theoretical underpinnings of most or all of the computational models of binding in use today, either explicitly or implicitly.
These equations also can be used to explain why it is difficult to devise an accurate method of computing the standard free energy of binding. One challenge has been to simply arrive at equations suitable for computing the standard free energy of binding. A number of computational methods have proven to be based upon erroneous or incomplete theoretical foundations.
A second challenge has been to develop a method of computing the potential energy of a molecule as a function of conformation; i.e., to compute U(r). One approach is to rely upon first principles quantum mechanical methods which account in detail for the electronic structure of a molecule or complex. (See, for example, the use of such methods in U.S. Pat. No. 6,106,562.) However, first principles quantum mechanical calculations are too slow to allow energies to be computed for the many molecular conformations required to evaluate the configuration integrals Z with useful accuracy. A second approach is to use what are known as empirical force fields. (See, for example, U.S. Pat. No. 6,785,665, which describes a method of constructing such a force field, and U.S. Pat. No. 5,424,963 for a previous application of the empirical force field associated with the AMBER computer program.) Empirical force fields are approximations, so they are not perfectly accurate.
A third challenge has been to arrive at a method of computing the solvation energy as a function of molecular conformation W(r). An important early approach was to carry out computer simulations in which many water molecules, often thousands of them, were treated explicitly, surrounding the molecules of interest and vibrating and diffusing due to heat energy. However, this approach is highly time-consuming. More recently, it has been discovered that the solvation energy can be computed to useful accuracy by what are called implicit solvent models. These avoid explicitly modeling many water molecules, and instead estimate the solvation energy via simpler and computationally faster physical models. The most widely accepted class of implicit solvent models treat the solvent as a high-dielectric medium that follows the laws of classical electrostatics. Within this approach, the most accurate methods solve the classical electrostatic equations (i.e., the Poisson-Boltzmann equation) via numerical methods such as the method of finite differences. These numerical methods provide a good estimate of the electrostatic part of the solvation energy of a molecule in a given conformation within typically seconds to a few minutes of computer time. This makes them much faster than explicit solvent models, but they are still too slow for some applications. A more rapid electrostatic approximation is the generalized Born approximation. (See, for example, J. Comput.-Aided Molec. Design 5:5-20, 1991 and U.S. Pat. No. 5,420,805.) The generalized Born approximation is considerably faster, but is widely regarded by those skilled in the art as being significantly less accurate than numerical solutions of the Poisson-Boltzmann equation. In addition to the electrostatic part just discussed, the salvation energy also includes a nonpolar part which is often approximated as proportional to molecular surface area, with an empirically derived coefficient.
A fourth challenge in the calculation of binding affinities is evaluation of the ratio of configuration integrals found in Equation 3 and defined in Equation 4. In one general approach, computer simulations are used to compute the thermodynamic work of bringing the two molecules together to form a bound complex. This approach can be accurate, but is notoriously time-consuming computationally, primarily because the simulations must sample a very large number of molecular conformations during the binding process, and simulations typically sample conformations essentially randomly. On the other hand, less rigorous approaches have not yielded accurate results. A new class of approaches which began to emerge in the late 1990s sought to achieve the rigor of the simulation methods while achieving much greater computational speed (Gilson, et al., Chem. Biol. 4:87-92, 1997). This approach, here termed Local Integration, uses the fact that the largest contributions to a configuration integral come from low-energy conformations of the molecule or complex, because that is where the integrand is largest (Equation 4). The configuration integral thus can be approximated as a sum of local contributions from local energy wells:
                    Z        ≈                              ∑                          i              =              1                        M                    ⁢                      Z            i                          ≡                              ∑                          i              =              1                        M                    ⁢                                    ∫                              i                =                1                                      ⁢                                          ⅇ                                                                            -                                              (                                                                              U                            ⁡                                                          (                              r                              )                                                                                +                                                      W                            ⁡                                                          (                              r                              )                                                                                                      )                                                              /                                          R                      g                                                        ⁢                  T                                            ⁢                                                          ⁢                              ⅆ                r                                                                        Equation        ⁢                                  ⁢        5            
Here i indexes M energy wells and Zi is a configuration integral local to energy well i, as indicated by the subscript i on the integral sign. Methods in this class can take advantage of powerful optimization algorithms to discover low-energy conformations i much more rapidly than is possible in the course of a simulation. However, the methods in this class vary significantly in how they evaluate the local configuration integrals Zi.
The apparently earliest Local Integration method, Mining Minima (Head et al, J. Phys. Chem. A 101:1609-1618, 1997), evaluates a local configuration integral Zi via unbiased Monte Carlo sampling in a region with hypervolume Vi in the vicinity of energy minimum i located at internal coordinates ri:
                                          Z            i                    ≈                                    V              i                        ⁢                          1              n                        ⁢                                          ∑                                  j                  =                  1                                n                            ⁢                              ⅇ                                                                            -                                              (                                                                              U                            ⁡                                                          (                                                              r                                j                                                            )                                                                                +                                                      W                            ⁡                                                          (                                                              r                                j                                                            )                                                                                                      )                                                              /                                          R                      g                                                        ⁢                  T                                                                    ⁢                                                      Equation        ⁢                                  ⁢        6            
This sum runs over n conformations rj generated computationally by adding small, uniformly distributed random numbers to ri, and thereby sampling conformations rj in the vicinity of ri and within hypervolume Vi. The range of the random numbers is gradually expanded in the course of the calculation until the value of Zi for energy well i converges to within a preselected tolerance. This method significantly advanced the state of the art, but had two important limitations. First, it could not efficiently include sampling of bond-lengths and bond-angles, but instead included only bond-torsions as degrees of freedom. Second, the calculations are excessively time-consuming for many molecular systems of interest because the number of samples n required to achieve convergence is large.
The second method in the class of Local Integration methods, MINTA (U.S. Pat. No. 6,178,384), sought to enable effective sampling over not only torsion angles but also bond-lengths and bond-angles. This was accomplished by using a parabolic (harmonic) representation of the energy in the vicinity of each energy minimum i to guide the selection of conformational samples in the energy minimum. However, this method still possesses significant drawbacks. First, it has been found (Potter & Gilson, J. Phys. Chem. A 106:563-566, 2002) to rely upon an incorrect expression for the configuration integral, different from those listed above, and therefore can yield results with significant errors. Second, it still requires a large number of conformational samples; that is, n is still large. As a consequence, the method is still too slow for many applications of interest. For example, it was reported to require 10,000 conformations per energy minimum for a small bimolecular complex (U.S. Pat. No. 6,178,384). Far more samples would be required for a protein complex with a small molecule, because even more samples would need to be generated to integrate within each energy well. Even more importantly, a protein has far too many local energy minima i to permit the minima to be discovered computationally and integrated by a current computer within an acceptable amount of time. Thousands of years or more might be required to complete such a calculation. As a consequence, this method is at best difficult and in many cases impossible to apply to many of the molecular systems for which the ability to compute binding affinities would be most valuable.
More recently, a more efficient Local Integration method has been described and shown to provide good accuracy and computational speed for binding reactions involve a small molecule and a small receptor. The method, M2 (Chang et al., J. Phys. Chem. B 107:1048-1055, 2003; Chang & Gilson, J. Am. Chem. Soc. 126:13156-13164, 2004), estimates the configuration integral Zi in a given energy well i as follows. First, the matrix of second derivatives of the energy is computed with respect to atomic coordinates expressed in terms of bond-lengths, bond-angles, and bond-torsions. This matrix is then diagonalized to yield 3Natoms−6 eigenvalues and 3Natoms−6 corresponding eigenvectors, where Natoms is the number of atoms in the molecule or complex. The energy in the vicinity of the energy minimum can then be approximated by forming a multidimensional Taylor series expansion and using its second order (quadratic) term to define a multidimensional parabola with principal axes lying along the eigenvectors and corresponding coefficients (or force constants) equal to the corresponding eigenvectors. (Implementations to date have not employed higher-order terms in the Taylor series expansion because this would increase the complexity of the calculations and it has not been clear that the improvement in accuracy would be substantial.) This approximation, often termed the harmonic approximation, allows the integrand of the local configuration integral to be approximated as a multidimensional Gaussian function with the same principal axes as the parabola and with corresponding standard deviations that can readily computed from the eigenvalues by formulae well known to those skilled in the art. This multidimensional Gaussian can then readily be evaluated as the sum of contributions along each principal axis (eigenvector). Typically, the integral along eigenvector j is restricted to a domain determined by the standard deviations of the Gaussian along this eigenvector and/or by a maximum allowed excursion of any torsion angle as the molecule is distorted along the eigenvector. These limits prevent the integration range from extending into any neighboring energy wells. When the integration method is based purely upon this Gaussian approximation, it is termed the harmonic approximation.
In a further improvement, the eigenvectors with the smallest eigenvalues are subjected to numerical integrals by a process in which the molecule or complex is distorted stepwise along the eigenvector; at each step, the integrand exp(−(U(r)+W(r))/RgT) is computed after each step, and these values are used to compute a numerical estimate of the integral along the eigenvector. If the integral from this numerical scan differs significantly from the analytic integral of the Gaussian, then the numerical result is used in place of the analytic one. Based upon this methodology, a preferred form for the resulting expression for the local configuration integral becomes:
                              Z          i                ≈                                            ⅇ                                                                    -                                          (                                                                        U                          ⁡                                                      (                                                          r                              i                                                        )                                                                          +                                                  W                          ⁡                                                      (                                                          r                              i                                                        )                                                                                              )                                                        /                                      R                    g                                                  ⁢                T                                      ⁡                          (                                                b                  2                  2                                ⁢                                                      ∏                                          l                      =                      3                                                                                      N                        atoms                                            -                      2                                                        ⁢                                                                          ⁢                                                            b                      l                      2                                        ⁢                    sin                    ⁢                                                                                  ⁢                                          θ                      l                                                                                  )                                ⁢                      (                                          ∏                                  j                  =                  3                                                  N                  scan                                            ⁢                              S                j                                      )                    ⁢                      (                                          ∏                                  k                  =                  1                                                  N                  Gaussian                                            ⁢                                                          ⁢                              [                                  (                                                                                    (                                                                              2                            ⁢                                                                                                                  ⁢                            π                            ⁢                                                                                                                  ⁢                                                          R                              g                                                        ⁢                            T                                                                                K                            k                                                                          )                                                                    1                        2                                                              ⁢                                          erf                      (                                                                                                    w                            k                                                    ⁡                                                      (                                                                                          2                                ⁢                                                                                                                                  ⁢                                π                                ⁢                                                                                                                                  ⁢                                                                  R                                  g                                                                ⁢                                T                                                                                            K                                k                                                                                      )                                                                                                    -                                                      1                            2                                                                                              )                                                        )                                ]                                      )                                              Equation        ⁢                                  ⁢        7            
Here b2 is the bond-length associated with atom 2, b1 and θ1 are the bond-length and bond-angle associated with atom l, the first quantity in parentheses is the Jacobian determinant due to the conversion from Cartesian coordinates to bond-angle-torsion coordinates, and the index 1 ranges over all but 2 atoms, as previously detailed (Chang et al., J. Phys. Chem. B 107:1048-1055, 2003); T is the absolute temperature; Nscan numerical integrals are included and Sj is the numerical integral along eigenvector j; Ngaussian analytic integrals are included and the quantity inside square brackets is the analytic integral along eigenvector k; Kk is the eigenvalue associated with eigenvector k, erf is the error function, and wk is the integration range used along eigenvector k. The integration method associated with Equation 7 is termed harmonic approximation/mode scanning (HA/MS). In the special case where Nscan=0, then all modes would be integrated harmonically via integrals of Gaussians, and the calculation would use only the harmonic approximation, without any mode scanning. Note that a different system of internal coordinates would require a different Jacobian determinant.
Another advantage of the M2 method derives from its treatment of the solvation energy W(r). As background, it is remarked that the MINTA method uses a generalized Born model (see above) for the electrostatic part of the solvation energy, although this solvation model is known by those skilled in the art to be less accurate than more time-consuming numerical solutions of the Poisson-Boltzmann equation, as discussed above. The M2 method also uses a generalized Born model during its most computationally demanding steps, in order to maximnize efficiency, generating a provisional value of the configuration integral in each energy well Zi,GB. However, M2 calculations are then corrected toward the full Poisson-Boltzmann model by carrying out a single Poisson-Boltzmann calculation of the solvation energy WPB(ri) at each energy minimum ri, as well as a single additional generalized Born calculation of the solvation energy WGB(ri), and using these data to generate a corrected value of the configuration integral. This strategy takes advantage of the speed of the approximate generalized Born model for most of the calculation, and then makes sparing but effective use of the accurate Poisson-Boltzmann model.
This M2 method has been found to combine excellent accuracy with far greater computational speed than the previous two Local Integration methods. Its accuracy derives from: the empirical fact that energy wells are quite closely approximated by parabolas, and their deviations from parabolic are readily captured by scanning along only the eigenvectors with the lowest eigenvalues; the use of bond-angle-torsion coordinates, rather than Cartesian coordinates; and the correction toward a more accurate solvation model. The speed derives from the fact that no random conformational sampling is involved: the method requires nothing more than systematic scanning along eigenvectors with low eigenvalues. Moreover, in many cases, it is found empirically that accurate results are obtained even when this numerical scanning technique is not used; i.e., for the harmonic approximation. Also, the treatment of solvation is both efficient and accurate. For these reasons, the method provides a much better combination of speed and accuracy than prior methods in this class of binding models.
The M2 method, described in the previous paragraphs, has many advantages over its predecessors, and has been found to give good agreement with experimental binding affinities for molecular systems in which both molecules are much smaller than proteins (Chang & Gilson, J. Am. Chem. Soc. 126:13156-13164, 2004; Chen et al., Biophys. J. 87:3035-3049, 2004; J. Am. Chem. Soc. 128:4675-4684, 2006). However, the method is not applicable to molecules as large as proteins. The chief reason is that a protein has far too many local energy minima i to permit them all to be discovered computationally and integrated by a current computer within an acceptable amount of time. As a consequence, although the M2 method is faster than other Local Integration methods, it still cannot be used for drug-discovery, enzyme engineering, or any other purpose involving a large molecule like a protein. It is an object of the present invention to enable calculation of affinities involving macromolecules.
As discussed above, the energy provided by an empirical force field approximates the energy of more time-consuming and rigorous first-principles quantum mechanics calculations. Empirical force fields are known by those skilled in the art to be particularly inaccurate for certain types of molecules and complexes, notably those which include metal ions, such as the Class B beta-lactamases responsible for many instances of antibiotic resistance in pathogenic microbes. (The Class B beta-lactamases are discussed in, for example, U.S. Pat. No. 5,955,604.) Another object of the present invention is to enable calculations of affinities involving interactions with metal ions.
Some ligands, such as “suicide inhibitors” (for example, U.S. Pat. No. 4,745,109), bind their receptors by forming not only noncovalent interactions but also by forming a covalent bond. In such cases, there is still a need for a computational method of predicting whether one candidate ligand will bind the receptor more tightly than another candidate ligand. However, none of the Local Integration methods previously devised provide a means of predicting the relative affinities of two such covalent ligands. It is yet another object of the present invention to enable accurate calculations of the relative affinity for two candidate ligands that covalently bind a targeted receptor.
In many important cases, the ligand that binds a protein receptor may itself be a large molecule, such as another protein. By way of non-limiting example, the receptor may be an enzyme, the ligand may be another protein that the enzyme cleaves catalytically, and the aim may be to modify the enzyme so that it cleaves a specific protein. Another object of the present invention is to enable the calculation of an affinity when the ligand itself is a macromolecule.
In other important cases, the receptor and/or the ligand is itself a molecular complex consisting of more than one molecule and held together by noncovalent and/or noncovalent forces. Yet an additional object of the present invention is to enable the calculation of affinities when the receptor and/or the ligand is itself a molecular complex.