1. Field of the Invention
The present invention relates to protein-ligand interactions. More particularly, the present invention provides a quantum mechanics-based method for scoring protein-ligand interactions and binding affinity predictions for computational drug design.
2. Description of Related Art
The study of protein-ligand interactions has been an area of active research with widespread implications, especially its potential impact on rational drug discovery and design. In particular, the prediction of free energy of binding (ΔGbind), referred to as “scoring,” via computational methods based on a description of the energetic components of binding, has proven to be a major challenge. In spite of all the recent developments in this area, a physically based model that is robust enough to satisfactorily evaluate the binding of ligands to proteins has been elusive.
The lifecycle for the discovery of a new drug typically spans ten to fifteen years. Worldwide research and development (R&D) expenditures by the pharmaceutical industry have been increasing steadily during the past thirty years. The annual R&D outlays for United States pharmaceutical companies reached twenty-six billion dollars in the year 2000. Of the estimated $802 million it now costs to discover a drug and bring it to market, approximately 75% of the overall R&D cost is attributed to failures. Thus, the ability to employ a scoring function that avoids the synthesis of false positives or eliminates the synthesis of false negatives could have a major impact on accelerating the discovery of low moleculer weight ligands that potently and specifically bind to a target. Additionally, the ability to rank analog series could facilitate the identification of alternative compounds in the presence of constraints imposed by pharmacokinetics and toxicity, thus reducing the likelihood of costly late stage failures.
An ideal drug is an exquisite balance of potency, selectivity, pharmacokinetics and toxicity. More importantly, the efficient identification of potent low molecular weight inhibitors would dramatically reduce the number of years it takes to bring important new therapeutics to market for the benefit of patients.
Thus, because of the relentless pressure on the pharmaceutical industry to reduce drug discovery costs, the development of in silico structure-based screening has become a very attractive and cost-effective alternative to traditional medicinal chemistry approaches. Additionally, the development of faster computer hardware, coupled with the development of ever more efficient computational algorithms, has facilitated greatly the screening of a large number and broad range of virtual compounds.
In particular, in silico structure-based screening methods can “dock” a low molecular weight into a biological receptor. Docking methods can be placed into two categories: stochastic methods such as AUTODOCK (Goodsell, D. S. et al., Proteins, 8:195-202, 1990), GOLD (Jones, G. et al., J. Mol. Biol. 267:727-748, 1997), and SAS; or combinatorial methods such as DOCK (Meng, E. C. et al., J. Comput. Chem., 13:380-397, 1992), FlexX (Kramer, B. et al., Proteins: Struct., Funct., Genet., 37:228-241, 1999), Hammerhead and LibDock. Docking is based on the principles of molecular recognition, and the above programs are capable of sampling the conformational space of a low molecular weight molecule and “pose” them in the active site of a protein.
Docking and scoring are interrelated in computational drug design and the success of one depends on the other. Simplistic docking potentials have been reasonably successful in lead identification, library enrichment and virtual high throughput screening (vHTS). Docking of compounds into a receptor binding site and then scoring the docked “pose” is a commonly used methodology for identifying lead compounds. A critical part of drug discovery is in lead optimization, and it is here that the accuracy of scoring functions is more important than the speed of the scoring. Docking algorithms have been reasonably successful in predicting binding modes. However, scoring of the poses in order to predict the binding free energy has proven much more challenging. This is because the pose that is closest to the experimental state is often not ranked energetically as the most favorable pose within a set of decoys. It is believed that docking algorithms reasonably can predict conformational sampling, however, docking functions are not reliable enough to separate native-like answers from alternative solutions.
Current generation scoring functions (SFs) can be grouped into three categories: empirical scoring functions (ESFs), knowledge-based potentials (KBPs) and force-field (FF) methods. ESFs, e.g., LUDU, piecewise linear potential (PLP), VALIDATE and ProteusScore, use empirically derived energetic contributions related to enthalpy (ΔH) and entropy (ΔS) and regression methods to fit it to a set of experimental observations. Cross-validation is done on test sets to test the accuracy of prediction using these functions. The problem with empirical methods is that they can only be as discriminating as the contributions in the scoring function. Usually, the contributions are calculated using simplistic functions, for example, a clash potential for no overlap and a simple geometric function for hydrogen bond interaction. Also, the diversity of a training set is an important factor in the development of such scoring functions (Bohm, H. J. Mol. Des., 12:309-323, 1998).
Knowledge-based methods or potentials (KBPs), such as PMF, SMoG, BLEEP, and DrugScore, are based on statistical mechanics and are borrowed from protein folding studies. The free energy of binding (ΔGbind) is represented as a potential of mean force calculated from frequencies of interatomic contacts from a database of structures (Muegge, I. et al., J. Med. Chem., 42:791-804, 1999). KBPs have been shown to be successful in calculating ΔGbind, however their accuracy depends on the proper definition of the reference state and on the number of structures available to build the potential (Ishchenko, A. V. et al., J. Med. Chem., 43:2770-2780, 2002). Additionally, it is believed that a detailed consideration of steric interactions with a careful treatment of dispersive forces is needed even while using statistical potentials derived from a structural database.
Force-field (FF) or molecular mechanics-based scoring functions use all atom force-field potentials such as AMBER, CHARMM, OPLS, and MMFF to score poses, and have been used frequently in free-energy perturbation (FEP) methods and are implemented by computer programs to evaluate relative ΔGbind (Kollman, P. A. Chem. Rev., 7:2395-2417, 1993). FEP methods employ molecular dynamics simulations, are computationally expensive and can handle congeneric ligands that differ only in one or two functional groups. Although FEP functions are physically based, there is a problem of transferability of force-field parameters to protein-ligand interactions. FF models have been shown to be extremely powerful in modeling biological systems but generally use simple electrostatic models (Coulomb potentials), which limit their ability to accurately model electrostatic energies. Furthermore, in certain cases, use of a Lennard-Jones 6-12 potential to evaluate non-bonded interactions in ligands that bind very tightly to substrates may lead to artefactual interaction energies. In such cases, a softer potential is better suited for the calculation of non-bonded interactions.
Thus, while the pose generation problem has been solved for the most part, scoring, i.e., ΔGbind, of the poses has been a major challenge.
Quantum mechanics (QM), although not new to the field of molecular interaction, has until recently only been used to study smaller biochemical systems because of the exorbitant computational costs associated with such QM analysis. This is because the numerical effort of conventional electronic structure methods scales as N3 or higher, where N is the number of electrons. For example, both Kohn-Sham density functional theory (DFT) (Kohn, W. et al., Phys. Rev., 140:A1133, 1965) and semiempirical molecular orbital (MO) theory (Dewar, M. J. S., The Molecular Orbital Theory of Organic Chemistry, McGraw-Hill, New York, 1969) scale as N3; Hartree-Fock (H-F) Hehre, W. J. et al., Ab Initio Molecular Orbital Theory, Wiley, N.Y., 1986) calculations scale according to N4; and post-Hartree-Fock treatments, such as configuration interaction (Hehre, W. J. et al., Ab Initio Molecular Orbital Theory, Wiley, N.Y., 1986) and Moller-Plesset theory (Moller, C. et al., Phys. Rev., 46:618, 1934) can exhibit N5 scaling and higher. This nonlinear behavior ultimately places severe limitations on the maximum number of atoms that may be considered in any one system.
Thus, much effort has been made in the development of linear scaling quantum calculations, i.e., methods that require computational effort proportional to the size of the system. Yang (Phys. Rev. Lett., 66:438, 1991) first proposed a divide-and-conquer (D&C) approach and demonstrated that it is possible to attain a solution of linear scaling by localizing the electronic degrees of freedom; Galli et al. (Phys. Rev. Lett., 69:3547, 1992) suggested a linear scaling algorithm and applied tight-binding Hamiltonians; Li et al. (Phys. Rev. B, 47:10891, 1993) introduced a variational method for obtaining the density matrix with cutoff in real space and showed linear scaling in computational effort; and Lee et al. implemented a density matrix D&C approach into linear-scaling semiempirical quantum calculations for large molecules over 9000 atoms on a typical workstation (J. Chem. Phys., 105(7):2744-2750, 1996).
The use of quantum mechanics avoids the need for force-field-based methods, especially when evaluating electrostatic interactions, where monopole-monopole interactions only can be loosely approximated. Moreover, force-field-based methods inherently are incapable of treating quantum mechanical effects, such as polarization and charge transfer.
Until now, however, there has not been a physically sound, accurate and efficient tool for predicting the free energy of binding, i.e., scoring, of protein-ligand interactions, a tool which would have widespread application in the field of computational drug design.