This specification includes a microfiche appendix containing a listing of the computer programs of this invention, this appendix comprising 2 microfiche of 101 total frames.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by any one of the patent disclosure, as it appears in the Patent and Trademark Office patent files and records, but otherwise reserves all copyright rights whatsoever.
The field of this invention is computer assisted methods of drug design. More particularly the field of this invention is computer implemented smart Monte Carlo methods which utilize NMR and binders to a target of interest as inputs to determine highly accurate molecular structures that must be possessed by a drug in order to achieve an effect of interest. Illustrative U.S. Patents are U.S. Pat. No. 5,331,573 to Balaji et al., U.S. Pat. No. 5,307,287 to Cramer, III et al., U.S. Pat. No. 5,241,470 to Lee at al., and U.S. Pat. No. 5,265,030 to Skolnick et al.
Protein interactions have recently emerged as a fundamental target for pharmacological intervention. For example, the top two major uncured diseases in the United States are atherosclerosis (the principal cause of heart attack and stroke) and cancer. These diseases are responsible for greater than 50% of all U.S. mortality and cost the U.S. economy over $200 billion per year. A consistent picture of these diseases, which has gradually emerged during the past ten years of molecular biological and medical research, views both as triggered by disordering of specific molecular recognition events that take place among sets of proteins present in both the normal and disease states.
Hierarchical, organized patterns of protein-protein interactions are often referred to as xe2x80x9cpathwaysxe2x80x9d or xe2x80x9ccascades.xe2x80x9d At the molecular level, cancers have been determined to be the deregulation of pathways of interacting proteins responsible for guiding cellular growth and differentiation. During the past year, individual cellular events have been organized into nearly complete mechanistic explanations of how a cell""s behavior is controlled by its environment and how communication pathway errors lead to uncontrolled proliferation and cancer. Disruption in similar pathways are responsible for the proliferation of blood vessel walls marking the atherosclerotic disease state (Cook et al., 1994, Nature 369:361-362; Hall, 1994, Science 264:1413-1414; Ross, 1993, Nature 362:801-809; Zhang et al., 1993, Nature 364:308-313).
Inhibition or stimulation of particular protein-substrate interactions have long been known drug targets. Many important anti-hypertensives, neurotransmitter analogues, antibiotics, and chemotherapeutic agents act in this fashion. Captopril, an antihypertensive drug, was designed based on its ability to antagonize a focal blood-pressure-regulating enzyme.
Proteins involved in biological processes, either as part of protein-protein pathways or as enzymes, are composed of domains (Campbell et al., 1994, Trend. BioTech. 12:168-172; Rothberg et al., 1992, J. Mol. Biol. 227:367-370). Domains, or regions of the protein of stable three dimensional (secondary and tertiary) structures, play several major roles, including providing on their surface small regions (xe2x80x9cexamples of targetsxe2x80x9d), where proteins and substrates are able to bind and interact, and functioning as structural units holding other domains together as part of a large protein (tertiary and quaternary structure). The interaction surface of a domain or target is fundamental to determining binding specificity. Targets are often small enough that the principal contribution to the binding energy is short range, highly localized to several amino acids (Wells, 1994, Curr. Op. Cell Biol. 6:163-174). The functional specificity of targets and domains, responsible for the incredible diversity of cellular function, ultimately rests with the arrangement of amino acid side chains forming their interaction surfaces, or targets (Marengere et al., 1994, Nature 369:502-505).
It can be appreciated, therefore, that pharmacological intervention affecting the specific protein-protein and protein-substrate recognition events occurring at protein targets is of fundamental importance, particularly for effective drug design.
However, achieving desired pharmacological interventions in a predictable manner remains as elusive as ever. Early approaches to drug design depended on the chance observation of biological effects of a known compound or the screening of large numbers of exotic compounds, usually derived from natural sources, for any biological effects. The nature of the actual protein target was usually unknown.
2.1. TARGET STRUCTURE-BASED APPROACHES TO DRUG DESIGN
Rational approaches to drug design have met with only limited success. Current rational approaches are based on first determining the entire structure of the proteins involved in particular interactions, examining this structure for the possible targets, and then predicting possible drug molecules likely to bind to the possible target. Thus the location of each of the thousands of atoms in a protein must be accurately determined before drug design can begin. Direct experimental and indirect computational methods for protein structure determination are in current use. However, none of these methods appears to be sufficiently accurate for drug design purposes according to current rational approaches.
The primary direct experimental methods for determining the structure of proteins involved in particular interactions are X-ray crystallography, relying on the interaction of electron clouds with X-rays, and liquid nuclear magnetic resonance (NMR), relying on correlations between polarized nuclear spins interacting via indirect dipole-dipole interactions. X-ray methods provide information on the location of every heavy atom in a crystal of interest accurate to 0.5-2.0 xc3x85 (1 xc3x85=10xe2x88x928 cm). Drawbacks of x-ray methods include difficulties in obtaining high-quality crystals, expense and time associated with the crystallization process, and difficulties in resolving whether or not the structure of the crystalline forms is representative of the in vivo conformation (Clore et al., 1991, J. Mol. Biol. 221:47; Shaanan et al., 1992, Science 227:961-964). High resolution, multidimensional, liquid phase NMR techniques represent an attractive alternative, to the extent that they can be applied in situ (i.e., in aqueous environment) to the study of small protein domains (Yu et al., 1994, Cell 76:933-945). However, the complexity of the analysis of the various mutual correlations is time consuming, and the correlations (primarily from the nuclear Overhausser effect) provide no better accuracy than X-ray methods. Isotopic enrichment of proteins with 13C and 15N reduces the time associated with analysis, but at a great expense (Anglister et al., 1993, Frontiers of NMR in Biology III LZ011).
Protein structures determined by any of these current methods do not predict success in subsequent drug design. Resolution obtainable either by measurement or computation, generally 0.5-2 xc3x85, has often been found to be inadequate for effective direct drug design, or for selection of a lead compound from organic compound libraries. The resolution required to understand both drug affinity and drug specificity, although not precisely known, is probably measured in fractions of an xc3x85, down to 0.1 xc3x85 (MacArthur et al., 1994, Trend. BioTech. 12:149-153). This accuracy appears to be beyond the capabilities of many current methodologies.
Prior research has identified tools which, although promising, cannot be used in a coordinated manner for drug design. One promising measurement approach with speed, simplicity, accuracy, and the ability to carefully control the measurement environment is rotational echo double resonance (REDOR) NMR, a type of solid state NMR (Guillion and Schaefer, 1989, J. Magnetic Resonance 81:196; Holl et al., 1990, J. Magnetic Resonance 81:620-626 and McWherter, 1993, J. Am. Chem. Soc. 115:238-244). REDOR accuracy can be below the 0.1 xc3x85 believed to be sufficient for direct drug design. However, since REDOR measures only a few selected distances, it is not usable in drug design methods which depend on the initial determination of the complete structure of the protein containing the target of interest.
Once a target""s structure is determined by the above methods, most rational drug design paradigms call for the prediction of small drug structures that will bind (or dock) to the target. This prediction is generally done by computational methods, of which several are in current use. Most seek to predict the position of all the thousands of atoms in a drug structure. Purely ab initio computational approaches to high resolution structure analysis, such as quantum statistical mechanics and molecular dynamics, require prohibitive computing resources. To apply either approach, the potential energy, or Hamiltonian, of the entire system must be known. Statistical mechanics provides an expression for the probability of any given protein configuration as a ratio of partition functions. Proper quantum statistical mechanics required for an exact evaluation of full protein partition functions is not currently computationally feasible, as it would involve many thousands of atoms including the target, the protein, and the aqueous environment. The application of even simple, approximate quantum statistical mechanics to simple systems in aqueous environments is currently a non-trivial task (Chandler, 1991, in Liquids, Freezing, and Glass Transitions, Elsevier, N.Y., p. 195). Molecular dynamics computes the dynamics of a molecule""s motion in time. Computing the atomic dynamics of all the perhaps thousands atoms of a protein is an extreme computational burden. Only picoseconds, or at most a few nanoseconds, of molecular time can be simulated, which is insufficient to determine a high resolution, equilibrium, structure (Smit et al., 1994, J. Phys. Chem. 98:8442-8452). In any case, most of the information determined is wasted, since only the structure of the protein binding target are of interest in drug design.
Further, current approximate computational techniques for protein structure determination are in need of greater accuracy or efficiency. The most common techniques depend on Molecular Dynamics or Monte Carlo methods (Nikiforovich, 1994, Int. J. Peptide Protein Res. 44:513-531; Brxc3xcnger and Karplus, 1991, Acc. Chem. Res. 24:54-61). These methods randomly alter initial molecular structures by generating simulated thermal perturbations, and then average the ensemble of results to determine a final structure. The generated perturbation must preserve all structural constraints and be energetically favorable. If both conditions are not met, the perturbation will be discarded. Current Monte Carlo methods applied to constrained protein structure determinations productively use only approximately 1 out of 105 perturbed structures generated (Siepmann et al., 1993, Nature 365:330-332). This extreme waste of computer resources results in time consuming, low resolution structure determinations.
To summarize, existing rational drug design methods based on identification of target structure fail to reliably yield drug molecules due to experimental structure determination difficulties and computational difficulties associated with predicting drug structures with ill-defined Hamiltonians.
2.2. DIVERSITY-BASED APPROACHES TO DRUG DESIGN
Another method for exploring protein target interactions utilizes xe2x80x9crecognition systemsxe2x80x9d which comprise huge libraries of related molecules (Clarkson et al., 1994, Trend. BioTech. 12:173-184). From such a library only those members binding to the target of interest are selected. Such recognition systems must encompass the structural diversity of protein targets while being amenable to serve for the selection of lead compounds for drug design. Antibodies are one classic example of such a system that certainly meets the recognition requirement. Unfortunately, there is a need to determine the antibody structures needed for lead compound selection more rapidly and accurately. While about 2000 recognition regions have been sequenced, only about 23 in the Brookhaven Protein Structural Database have structures determined to even within 2 xc3x85 (Rees et al., 1994, Trends in Biotech. 12:199-206).
Promising recognition systems at the opposite extreme comprise huge libraries of small peptides. The small peptides must be sufficiently diverse so that they attain a level of affinity and specificity similar to that obtained by protein domains. Given the role peptides play in nature, this condition can be met by surprisingly small structures, with 6 to 12 amino acids. However, linear peptides are either unstructured or weakly structured at room temperature in aqueous solutions (Alberg et al., 1993, Science 262:248; Skalicky et al., 1993, Protein Science 10:1591-1603). From a practical viewpoint, linear peptides must be constrained to reduce their degrees of freedom (reduced conformational entropy) and to increase their chances for strongly binding. These constraints, or scaffolds, limit the range of stable conformations and make more straightforward determining bound structure (Olivera et al., 1990, Science 249:259; Tidor et al., 1993, Proteins: Structure Function and Genetics 15:71).
Methods are now available to create such libraries and to select library members that recognize a specific protein target. The production of constrained peptide diversity libraries requires synthesizing oligonucleotides with the desired degeneracy to code for the peptides and ligating them into selection vectors (Goldman et al., 1994, Bio/Tech. 10:1557-1561). Once a constrained structured diversity library is created, it is a source from which to select specific members that bind to a target of interest. Beginning with a known pathway involving specific domain-domain or protein-substrate interactions at a target, molecular biological methods can be used to identify in a matter of days small ensembles of highly constrained peptides from these huge libraries that bind to these domains with high affinity and specificity.
While this field has been exploding in the last few years and showing great potential, it is severely limited by its use in isolation without the benefit of integrated structural analysis needed both to derive the high resolution structures of binding peptides and also to direct the construction of additional structured libraries. Drug design is not aided by having library members recognizing the protein target of interest but without any understanding of why the recognition occurs. This is entirely similar to the random screening methods of early fortuitous drug design efforts.
Unfortunately, rational drug design according to current approaches (target structure-based) remains an inefficient, laborious process with a disproportionately high lead-compound failure rate. Presently, about 90% of lead compounds fail to emerge successfully from clinical trials (Trends in U.S. Pharmaceutical Sales and Research and Development, Pharmaceutical Manufacturing Association, Washington, D.C., 1993).
It is becoming clear that low-resolution structures of an entire protein or target (at 0.5-2 xc3x85), or an uncharacterized lead, such as produced by chemical diversity methods, leave much to be desired for use in drug design.
If the limitations of prior art methods were overcome and a sufficiently accurate structure needed by a molecule to bind to a target of interest could be determined, existing chemical libraries could be searched for highly targeted lead compounds with similar structure (Martin, 1992, J. Medicinal Chem. 35:2145-2154). This database search can be based not only on chemical and electronic properties, but also on geometric information. Such searches that have high resolution (better than 0.25 xc3x85), would provide a vast improvement over the prior art, as lower resolutions lead to an exponentially increasing number of potential leads.
Computational methods to determine high resolution drug structures from recognition system binding information or NMR partial distance measurements are not currently available. No current structure determination methods uses such additional information to make more efficient or more accurate determination of high resolution structures (Holzman, 1994, Amer. Sci. 872:267).
Citation of a reference or discussion hereinabove shall not be construed as an admission that such is prior art to the present invention.
It is a broad object of this invention to address the prior art problems of drug design by providing a method of rational design of drugs that achieve their effect by binding to a target molecule or molecular complex of interest. Importantly, this object is achieved without requiring determination of the structure of the molecule or molecular complex (xe2x80x9ctarget moleculexe2x80x9d) bearing the target or even of the target itself. The method is target structure independent. The method of the invention uses an interdisciplinary combination of computational modeling and simulation, experimental distance constraints, and molecular biology.
In an important aspect, the invention provides a computer implemented modeling and simulation method to determine a highly accurate consensus structure for the pharmacophore and a structure for the remainder of the molecule from diversity library members that bind to the protein target of interest. Where prior structure determination methods focused on the structure of the target molecule or of the target, the method of this invention is uniquely adapted to focus instead on the structures of molecules that bind to the target. Such structural information is directly applicable to drug design since it defines the structure a drug must possess to bind to the target of interest. Also, this structural information is much easier to determine by use of the present invention, since it concerns molecules with many fewer atoms than the target molecule. The method of the invention achieves accuracy by improving upon the accuracy and utility of the input structural information. In a further embodiment of the invention, the method employed for structural determination is a smart Monte Carlo technique adapted to small constrained molecules.
The structure determination method of the invention allows one to take maximum advantage of the information obtained from the molecular biological selection of the diversity library members that tightly and specifically bind to the target molecule of interest. The selected library members must share some common structure to bind to the same target molecule. The smart Monte Carlo computer method of this invention specifically seeks and provides this common structure.
The invention also provides a method of performing REDOR NMR measurements of molecules on a solid phase substrate. In a preferred embodiment, the substrate is a solid phase on which the molecule (e.g., peptide) has been synthesized, with a high degree of purity. In another preferred embodiment, performing REDOR measurements of such a molecule on a substrate can be done in a dry nitrogen atmosphere, under hydrated conditions, and when the molecule is either free or bound to a target. In a specific embodiment, the REDOR measurements are accurate to better than 0.05 xc3x85 from 0 to 4 xc3x85, and to better than 0.1 xc3x85 from 4 to 8 xc3x85. In an advantageous aspect of the invention, the structure determination method makes maximum use of these highly accurate internuclear distance measurements to constrain the determined common structure for the binding library members.
The invention also provides methods of identifying a compound that specifically binds to a target molecule, by first screening a diversity library, and then using a genetic selection method for screening the compounds identified from the diversity library.
In broad aspects, the invention provides a method and apparatus for rational and predictable design of new and/or improved drugs that achieve their effect by binding to a specified target molecule. More particularly, the invention is directed to a method for the rational selection of highly specific lead compounds for such drug design, including the computer implemented step of highly accurate determination of the structure responsible for this target binding by the highly accurate, consensus, configurational bias Monte Carlo method.
A lead compound serves as a starting point for drug development both because it specifically binds to the protein target of interest, achieving the biological effect of interest, and because it has or can be modified to have good pharmacokinetics and medicinal applicability. A final drug may be the lead compound or may be derived therefrom by modifying the lead to maximize beneficial effects and minimize harmful side-effects. Although any lead compound is useful, a lead that tightly and specifically binds to the target molecule of interest in a known manner, such as can be provided by the invention, is of great use. Knowledge of the high resolution structures in a lead compound responsible for its binding and activity provides a more focused and efficient drug development process.
The methods of the invention improve lead compound determination, by determining the xe2x80x9cpharmacophorexe2x80x9d, the precise structural characteristics needed for a lead compound to specifically bind to a target of interest. The most fundamental specification of a pharmacophore is in terms of the electronic properties necessary for a molecule to specifically bind to the surface of a target molecule. These properties may be fundamentally represented by requirements on the ground and low lying excited state wave functions of a pharmacophore, such as, for example, by specifying requirements on the well known multiple expansion of these wave functions.
The preferred pharmacophore specification according to the invention is in terms of both the chemical groups making up the pharmacophore and determining its electronic properties and also the geometric relationships of these groups. This chemical representation is not the only possible representation of the pharmacophore. Several chemical arrangements may have similar electronic properties. For example, if a pharmacophore specification included an xe2x80x94OH group at a particular position, a substantially equivalent specification might include an xe2x80x94SH group at the same position. Equivalent chemical groups that may be substituted in a pharmacophore specification without substantially changing its nature are called xe2x80x9chomologousxe2x80x9d.
In particular embodiments, therefore, this invention provides a method and apparatus for the highly accurate determination of the pharmacophore needed to specifically bind to the target molecule of interest, by a specification of the geometric relationships of the important chemical groups. The pharmacophore is preferably determined by a smart Monte Carlo method from molecular biological input specifying molecules (preferably selected from among diversity libraries) that specifically bind to the target molecule and also preferably from REDOR NMR data specifying a few highly accurate distances in these selected molecules.
An important advantage provided by the invention is the ability to make a pharmacophore structure determination without relying on any knowledge of the structure of the target molecule or target. Where the target molecule is a protein, conventional prior art methods have sought to sequence and determine the structure of the protein containing the target, hoping thereby to determine active sites by examination of the structure. A further important advantage of the invention is that this structure determination can be made by use of a relatively small number of actual physical position measurements. In contrast, conventional methods using X-ray crystallography and liquid NMR require determination of positions of all atoms in the molecule (xe2x80x9cbinderxe2x80x9d) that specifically binds to the target, and the target. An additional advantage provided by the invention is that, in a preferred embodiment wherein REDOR structural measurements provide input information, the accuracy of the pharmacophore structure determination can be at least approximately 0.25-0.50 xc3x85 or better. This accuracy is provided by the combination of an efficient, Monte Carlo technique for structure determination with a few highly accurate distance determinations.