This application claims priority under 35 U.S.C. 119 to Singapore Application Number 9904404-2 filed on Sep. 6, 1999.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyrights whatsoever.
1. Field of the Invention
This invention relates to methods and apparatus for detecting protein and nucleic acid targets of a drug molecule. Aspects of the invention relate to selecting proteins and nucleic acids from a biomolecular cavity database that a drug can bind to; both geometrically and chemically. This general field is known as xe2x80x9cMolecular Modelingxe2x80x9d (MM) and xe2x80x9cComputer Assisted Molecular Designxe2x80x9d (CAMD). When used for pharmaceutical discovery, this field is referred to as xe2x80x9cComputer-Aided Drug Designxe2x80x9d (CADD).
2. Description of the Related Art and Summary
The Need for a Computer Drug Target Prediction Method
A number of strategies have been proposed and used for rational discovery of new drugs. These include combinatorial chemistry, which is described in xe2x80x9cDirected combinatorial chemistryxe2x80x9d, J. C. Hogan Jr. Nature 384 suppl., 17-19 (1996); high-throughput screening, which is described in xe2x80x9cHigh-throughput screening for drug discoveryxe2x80x9d, J. R. Broach and J. Thorner. Nature 384 suppl., 14-16 (1996).; QSAR, which is described in xe2x80x9cStrategies for indirect computer-aided drug Designxe2x80x9d, Loew G H, Villar H O, Alkorta I, Pharm Res 10, 475-86 (1993); and structure-based molecular design, which is described in xe2x80x9cStructure based drug designxe2x80x9d, Blundell. Nature 384 suppl., 23-26 (1996).
All of these strategies focus on the design of lead compounds (drug precursors) having biological activity against a defined protein or nucleic acid target. Possible interactions of these compounds with other proteins and nucleic acids are not accounted for by these methods. These interactions have implications to secondary therapeutic effects, unwanted side effect and toxicity. Although good biological activity is a necessity, a drug candidate needs to pass additional tests and clinical trials for side effect, toxicity, bioavailability as well as efficacy. These tests and clinical trials are very costly and time consuming.
Computer methods and apparatus for identifying protein and nucleic acid targets can facilitate fast-speed predictions of drug-protein and drug-nucleic acid interactions that might have implications for possible side effects, toxicity and other unwanted effects without costly and time-consuming tests and clinical trials. This is particularly important given that much of the $350 million and 12 years spent on average for commercial drugs have been squandered on many drug candidates that failed to ever reach the market. Surveys of drug discovery costs and times can be found in xe2x80x9cStrategic choices facing the pharmaceutical industry: A case for innovationxe2x80x9d, J. Drews, Drug. Discov. Today 2, 72-78 (1997); xe2x80x9cNew drug development in the United States from 1963 to 1990xe2x80x9d, J. A. DiMasi, N. R. Bryant, and L. Lasagna, Clin Pharmacol Ther. 50, 471-486 (1991).
The molecular targets of a number of natural product drugs, particularly those from traditional medicinal plants, are unknown. This hinders the effort to design new drugs based on the molecular mechanism of these drugs. Experimental determination of molecular targets of a natural product is often slow and costly. However, a computer method for drug target identification would offer a fast and low cost approach to search possible drug targets.
Feasibility of Computer Drug Target Identification
A computer drug target identification strategy is feasible if: (1) a sufficiently diverse set of protein and nucleic acid 3D structures is available, and (2) a sufficiently fast and accurate drug identification algorithm is available under currently available and affordable computer systems. Prediction of therapeutic effects, side effects and toxicity requires knowledge of protein functions. As explained below, these conditions are being met.
At present, the 3D structure of 11,346 proteins, 557 protein/nucleic acid complexes and 857 nucleic acids have been released in the Protein Databank (PDB), and the number increases at a rate of more than 100 per month, as described in PDB home page http://www.rcsb.org/pdb/. About 17% of the proteins in PDB have unique sequences, as described in xe2x80x9cBridging the protein sequence structure gap by structure predictionsxe2x80x9d, B. Rost, C. Sander. Annu Rev Biophys Biomol Struct 25, 113-136 (1996). Thus, the number of proteins has reached a meaningful level to cover therapeutic, metabolic, side effect, and toxicity targets. The introduction of high-throughput analysis methods is expected to enable the determination of 10,000 proteins with unique sequence within 5 years, as discussed in xe2x80x9c100,000 protein structures for the biologistxe2x80x9d. A Sali, Nat Struct Biol. 5, 1029-1032 (1998).
Thus, a sufficiently diverse set of proteins and nucleic acids in PDB is expected in a few years. New advances in functional genomics and proteomics is providing information useful for predicting therapeutic effects, side effects and toxicity. Functional genomics is described in xe2x80x9cFunctional genomics: It""s all how you read itxe2x80x9d, P. Hieter and M. Boguski, Science 278, 601-602 (1997). Proteomics is described in xe2x80x9cProteomics. An ambitious drug development platform attempts to link gene sequence to expressed phenotype under various physiological statesxe2x80x9d. A. Persidis. Nature Biotech. 16, 393-394 (1998).
All drugs appear to bind to cavities of proteins or nucleic acids. Thus, a biomolecular cavity database can be introduced to facilitate computer drug target identification. A method is disclosed for computer automated generation of a biomolecular cavity database from entries of a biomolecule 3D structure database. This database can include all proteins and nucleic acids in PDB and it contains information about geometric and chemical features of cavities along with the 3D structure and chemical properties of the host biomolecules.
High-speed drug target identification can be achieved by a disclosed flexible ligand-biomolecule inverse docking algorithm. This algorithm searches a biomolecular cavity database to find proteins and nucleic acids to which a given drug or ligand can bind or weakly bind to. Testing results show that the average CPU time is 14-20 days for searching a cavity database containing a few thousands of proteins and nucleic acids.
Existing Computer Methods are not Capable of Drug Target Identification
Existing ligand-protein docking algorithms are useful only to cavities of a limited size. Human intervention is generally required to locate binding site or to derive a reduced model of a large cavity. Thus it is impractical to use existing method for automated docking of a drug to an arbitrarily chosen protein or nucleic acid in a database. For comparison, the disclosed vector-vector matching algorithm is capable of dealing with cavities significantly larger than that of existing methods.
Moreover, existing methods for generating protein cavity profiles are not suitable for development of a biomolecular cavity database. These methods normally require human intervention to either locate a binding site or to generate a specific cavity profile, which makes it impractical to generate cavity profiles for thousands of proteins and nucleic acids.
Existing methods are described in xe2x80x9cStructure-based strategies for drug design and discoveryxe2x80x9d, I. D., Kuntz, Science 257, 1078-1082 (1992); xe2x80x9cHammerhead: Fast, fully automated docking of flexible ligands to protein binding sitesxe2x80x9d, W. Welch, J. Ruppert, and A. N. Jain. Chem. Biol. 3, 449-462 (1996); xe2x80x9cCharacterization of receptors with a new negative image: Use in molecular docking and lead optimizationxe2x80x9d, C. M. Oshiro, and I. D. Kuntz. Proteins Struct. Func. Genet. 30, 311-336 (1998).
Potential Applications of the Disclosed Computer Drug Target Identification Method
The disclosed methods provide a unique mechanism for fast and low cost computer identification of proteins and nucleic acids that a drug can bind to. Subsequent analysis of the function of the identified proteins and nucleic acids, coupled by the consideration of feasibility of drug delivery to site of action, can then facilitate the prediction of unknown targets, secondary therapeutic targets, possible side effects and toxicity. Thus, the invention has potential applications in:
Facilitating the determination of unknown molecular mechanisms of drugs (such as natural products).
Facilitating the determination of secondary therapeutic effects of drugs and drug lead compounds.
Facilitating the prediction of possible unwanted side effect and toxicity of drug lead compounds in early stage of drug development.
Given on-going rapid advances in structural and functional genomics, information for an increasingly diverse set of proteins and nucleic acids is becoming available. The disclosed method likely will find wider and wider application in drug design.
It is an object of this invention to provide a new method for automated computer identification of possible protein and nucleic acid targets of drugs.
It is a further object of this invention to provide a new algorithm for docking a ligand to a cavity of a protein or nucleic acid.
It is a further object of this invention to provide a new method for automated computer generation of a biomolecular cavity database from entries of a biomolecule 3D structure database.
This invention relates to a method for identifying protein or nucleic acid targets of a drug by means of ligand-biomolecule inverse docking strategy. This strategy performs successive docking of a ligand in single or multiple conformations to multiple protein and nucleic acid entries in a biomolecular cavity database by the vector-vector matching algorithm described below. If a particular conformation of the ligand can be fitted to a cavity (steric clash is allowed at this stage), an energy minimization is conducted to release possible steric clash and to optimize the conformation of the drug and that of the side chain of amino acids or nucleotides at the binding site. Energy minimization is conducted by using published algorithms and parameters similar to that used in the software AMBER. AMBER stands for Assisted Model Building with Energy Refinement and it is a package developed by researchers at University of California San Francisco. A reference for AMBER can be found in xe2x80x9cA second generation force field for the simulation of proteins and nucleic acidsxe2x80x9d, Cornell, W D, Cieplak P, Bayly C I, Gould I R, Merz K M Jr, Ferguson D M, Spellmeyer D C, Fox T, Caldwell J W and Kollman P A. Journal of the American Chemical Society 117, 5179-5197 (1995). Docking evaluation is performed by examination of ligand-biomolecule interaction energy computed from molecular mechanics energy functions and parameters similar in that given in AMBER. Modification is made to replace AMBER hydrogen bond function by a Morse potential function. Morse potential function has been shown to give fairly accurate hydrogen bond energy in biomolecular systems, and the use of this potential helps to save computing time considerably. Application of Morse potential function in biomolecules is described in xe2x80x9cPremelting base pair opening probability and drug binding constant of a daunomycinxe2x80x94Poly d(GCAT)-Poly d(ATGC) complexxe2x80x9d, Y. Z. Chen and E. W. Prohofsky, Biophys. J. 66, 820 (1994). Proteins and nucleic acids are selected as molecular targets of the ligand if the interaction energy is below a certain value, which is a function of the number of non-hydrogen atoms in the drug.
Ligand binding is competitive in nature. A drug is unlikely to be effective if its binding is non-competitive against natural ligands and, to some extent, other drugs that bind to the same receptor site. This binding competitiveness may be partially taken into consideration for those cavities known to be ligand bound in at least one PDB entry. In addition to scoring based on the above energy threshold, computed energy is required to be comparable to that of the corresponding PDB ligand in selecting putative protein targets. Ligand-protein interaction energy for ligands found in PDB entries can be pre-computed and enclosed in a biomolecular cavity database (method described as a further aspect of this invention below).
In a further aspect of this invention, a vector-vector matching algorithm is introduced to efficiently place a ligand in a particular conformation into a cavity in a biomolecule. A ligand is composed of a group of atoms, and a cluster of spheres that fill in a cavity represents that cavity. A vector represents the relative position (distance and orientation) of an atom or a sphere with respect to the origin of a reference coordinate system on the ligand or that of sphere cluster respectively. In one embodiment, a coordinate system is defined for a ligand in a particular conformation based on three atoms of largest separation. Then, sets of three spheres matching the position of these three atoms are selected to define corresponding coordinate systems in a sphere group. Atom and sphere positions as vectors (xyz-coordinates) in the respective new coordinate systems can then be directly compared to dock a molecule into a cavity. The algorithm matches a ligand to a cavity by comparison of each of the ligand vectors with sphere vectors. A ligand is considered to be successfully placed into a cavity if each of all ligand vectors matches to at least one sphere vector.
In a further aspect of this invention, a method is introduced for computer automated generation of a biomolecular cavity database from entries of a protein or nucleic acid 3D structure database. This cavity database contains two sets of entries. The first set consists of cavity entries containing information about geometric and chemical features of cavities. The second set consists of host entries containing information about the 3D structure and chemical properties of host biomolecule. The minimum required information for each cavity entry: (1) position and radius of spheres of a sphere cluster representing a cavity, (2) spheres less than 3.5A away from a hydrogen bond donor or acceptor atom of the host biomolecule, and (3) the extent each sphere is covered by atoms of the host biomolecule. The minimum required information for each host biomolecule entry: (1) positions of atoms, (2) hydrogen bond donor and acceptors, (3) partial electrostatic charges of atoms, and (4) Van der Waals parameters. Van der Waals parameters describe the property of steric interaction of atoms. These parameters and partial electrostatic charges are described in xe2x80x9cA second generation force field for the simulation of proteins and nucleic acidsxe2x80x9d, Cornell, W D, Cieplak P, Bayly C I, Gould I R, Merz K M Jr, Ferguson D M, Spellmeyer D C, Fox T, Caldwell J W and Kollman P A. Journal of the American Chemical Society 117, 5179-5197 (1995).
In one embodiment of the invention, each cavity entry is generated from the following procedure: (1) Computation of molecular surface of a protein or nucleic acid by using, for example, a custom designed computer program or the DMS of the software suite Midus Plus. (2) Generation of sphere groups covering cavities and surfaces of the protein or nucleic acid by using, for example, a custom designed computer program or SPHGEN of the DOCK suit of software. (3) Selection of sphere clusters, each inside a cavity. (4) Output positions and chemical properties (such as hydrogen bonding site, polar or non-polar site etc) of the selected cavity clusters. The preferred output format is that compatible to SPHGEN and CLUSTER output format. Midus Plus is described in xe2x80x9cAn Affordable Approach to Interactive Desktop Molecular Modeling,xe2x80x9d, T. E. Ferrin, et. al. J. Mol. Graphics, 9, 27-32,37-38 (1991). SPHGEN is described in xe2x80x9cUsing shape complementarity as an initial screen in designing ligands for a receptor binding site of known three-dimensional structurexe2x80x9d, R. L. DesJarlais et. al, J. Med. Chem. 31, 722-729 (1988). CLUSTER is described at DOCK web-page: http://www.cmpharm.ucsf.edu/kuntz/dock.html.
Each host entry is generated from the following procedure: (1) Determining hydrogen bond donors and acceptors. (2) Assigning to each atom AMBER partial electrostatic charges, Van der Waals parameters. (3) Assigning to each atom atomic solvation parameters. (4) Outputting the quantities, with the preferred output format being compatible to the PDB format. Atomic solvation parameters measure the solvation effect of atoms and they are described in xe2x80x9cSolvation energy in protein folding and bindingxe2x80x9d. D. Eisenberg and A. D. Mclachlan. Nature 319, 199-203 (1986).
In a further aspect of this invention, to facilitate fast-speed scoring of binding competitiveness against other ligands that bind to the same receptor site, ligand-protein interaction energy for ligand found in each cavity entry is pre-computed and enclosed in the protein cavity database. The preferred method for computing ligand-protein interaction energy for PDB ligands is that composed of molecular mechanics energy functions and parameters similar in that given in AMBER. Modification is made to replace AMBER hydrogen bond function by a Morse potential function.