Discovery and development of drugs is a lengthy, expensive, and often unsuccessful process. Typically, it takes 12 to 16 years, and more than $500 million from the original concept to the market introduction of a new drug. Numerous software packages have been developed to assist in the development of new drugs. Unfortunately, all of these programs must make tradeoffs between completeness of their searching functions and the speed of their computations. In other words, drug design software that includes more efficient computational methods has a better chance of identifying new, useful drugs.
As is known by those skilled in the art, the vast majority of drugs are small molecules designed to bind, interact, and modulate the activity of specific biological receptors. Receptors are proteins that bind and interact with other molecules to perform the numerous functions required for the maintenance of life. They include an immense array of cell-surface receptors (hormone receptors, cell-signaling receptors, neurotransmitter receptors, etc.), enzymes, and other functional proteins. Due to genetic abnormalities, physiologic stresses, or some combination thereof, the number, structure, or function of specific receptors and enzymes may become altered to the point that our well-being is diminished. These alterations may manifest as minor physical symptoms, as in the case of a runny nose due to allergies, or as life threatening and debilitating events, such as sepsis or depression. The role of drugs is to modulate the number, structure, or activity of these receptors to remedy the resulting medical condition.
Enzymes are a subset of receptor-like proteins that are directly responsible for catalyzing the biochemical reactions that sustain life. For example, digestive enzymes act to break down the nutrients of our diet. DNA polymerase and related enzymes are crucial for cell division and replication. Enzymes are genetically programmed to be absolutely specific for their appropriate molecular targets. Any errors could have grave consequences. For example it would likely be fatal if blood-clotting enzymes began activating throughout the body, or if our immune system began attacking our own tissues. Enzymes ensure the specificity of their targets by forming a molecular environment that excludes interactions with inappropriate molecules. The analogy most often mentioned is that of a lock and key. The enzyme is a molecular lock, which contains a keyhole that exhibits a very specific and consistent size and shape. This molecular keyhole is termed the active site of the enzyme and allows interaction with only the appropriate molecular targets. Just as a typical lock is much bigger than the keyhole, the receptor is usually much larger than the active site. The receptor, as specified by our DNA, is a folded protein whose major purpose is to form and maintain the size and shape of the active site. This is illustrated in FIG. 1 using the structure of the HIV-1 protease, indicated generally at 100. The active site is indicated at 110.
The most important aspect of drug design relates to the mechanism by which the active site of a receptor selectively restricts the binding of inappropriate structures. Any potential molecule that can bind to a receptor is called a “ligand.” In order for a ligand to bind, it must contain a specific combination of atoms that presents the correct size, shape, and charge composition in order to bind and interact with the receptor. The ligand must possess the molecular “key” that binds the receptor lock.
FIG. 2 schematically shows a typical ligand-receptor binding interaction. The ligand is indicated at 200, and the walls of the active site 110 are shown at 210. For ligand-receptor interaction to occur, the ligand 200 must be complementary in size and shape to the receptor active site 110. This is known as “steric complementarity.” The more close the fit between the ligand and receptor, the tighter the interaction becomes. If a molecule varies from a functioning ligand by even a single atom in the wrong place, it may not fit properly, and therefore not interact with the receptor, or not interact strongly enough. Note that, although the schematic illustration of FIG. 2 is two-dimensional, both ligand 200 and active site 110 are three-dimensional.
In addition to steric complementarity, electrostatic interactions influence ligand binding. Charged receptor atoms often surround the active site 110, imparting a localized charge in specific regions of the active site. In FIG. 2, regions of relative negative charge are indicated at 220, while regions of relative positive charge are shown at 230. It will be appreciated by those skilled in the art that opposite charges attract while similar charges repel. Electrostatic complementarity further restricts the binding of inappropriate molecules, since the ligand 200 must contain correctly placed complementary charged atoms for it to interact with the active site 110.
It will be appreciated by those skilled in the art that the strongest driving force for ligand and receptor binding is “hydrophobic interaction.” Nearly two-thirds of the body is water, and this aqueous milieu surrounds all our cells. In order for ligand and receptor to interact, there must be a driving force that compels the ligand to leave the water and bind to the receptor. The hydrophobicity of a ligand is what causes this. Hydrophobicity is a measure of how “greasy” a compound is. It can be roughly approximated by the percentage of hydrogen and carbon in the molecule. This force can be demonstrated by placing a few drops of oil in a cup of water. The oil is composed of hydrocarbon chains and is highly hydrophobic. The oil droplets will rapidly coalesce into a single globule in order to avoid the water, which is highly polar. As shown in FIG. 2, the active site may contain a mixture of hydrophobic pockets and regions that are more polar. Since the hydrophobic portions of the ligand and receptor prefer to be juxtaposed, the arrangement of hydrophobic surfaces provides yet another way that receptors can limit the binding of inappropriate targets.
As discussed above, there are numerous potential interactions between ligand and receptor. Depending upon the size of the active site, there may be a myriad steric, electrostatic, and hydrophobic contacts. However, some are more important than others. The specific interactions that are crucial for ligand recognition and binding by the receptor are called the “pharmacophore.” Usually, these are the interactions that directly factor into the structural integrity of a receptor or are involved in the mechanism of its action. Only a molecule that presents the pharmacophore to the receptor properly interacts with the active site. This is crucial to the design of pharmaceuticals since any successful drug must incorporate the appropriate chemical structures and present the pharmacophore to the receptor.
This is illustrated in FIG. 3. A first molecule 310 is a native ligand bound within the active site. Assume that through biochemical investigation, we determine that the phenyl ring 322 and the carboxylic acid group 324 are vital to receptor interaction. Thus, we deduce that these two groups must be the pharmacophore 320 that a ligand must present to the receptor for binding. In future drugs that we develop to mimic the native ligand 310, we must include these two pharmacophoric elements for successful binding to occur. For example, the first derivative compound 330 in which a bicyclic group has been substituted maintains the pharmacophore and retains its complementary size and shape. The derivative compound 330 therefore has a reasonable chance of successfully binding. However, any drug that we develop which lacks a complete pharmacophore, such as the second derivative compound 340 shown in FIG. 3, may not interact with the receptor target.
When a medical condition exists where a drug could be beneficial, extensive scientific study must first be done in order to determine the biological and biochemical problems that underlie the disease process. This often takes years of study in order to characterize the targets for a potential drug. The reason is that nearly all biological processes in the human body are tightly interconnected. Altering the behavior of select receptors or enzymes may have detrimental effects with other systems. These are the side effects that occur with nearly all drugs. Furthermore, the human body is a homeostatic machine, and always attempts to achieve equilibrium. As a result, the body will attempt to counteract any pharmacotherapeutic intervention.
Once a receptor target has been established and well characterized, the process of ligand design begins. The designed ligand must complement the active site of the receptor target. Steric, electrostatic, and hydrophobic complementarity must be established, as discussed above. The pharmacophore must be presented to the receptor in order for recognition and binding to occur. Otherwise, the designed ligand will have no chance of interacting with the receptor.
In addition to adequately binding the receptor, the biochemical mechanism of the receptor target must be taken into consideration. FIGS. 4A and 4B schematically represent the biochemical mechanism of a protease 400. A protease is an enzyme that cleaves proteins and peptides. FIG. 4A shows that a protease 400 recognizes a specific group of atoms 410, that is a peptide bond in a ligand 450. If the peptide bond 410 is present at a specific position in the active site when the ligand 450 binds, it is cleaved by the protease with the addition of water (H2O) to form two separate fragments 420. If the goal is to inactivate this protease, any designed ligand must not possess this peptide bond at the same position. Otherwise, it will simply be cleaved by the protease 400, and the protease 400 will continue to function unperturbed. However, the ligand 450 can be modified to produce a different ligand 455, in which the peptide bond 410 is no longer present as shown in FIG. 4B. If the ligand 455 is bound by the enzyme 400, the enzyme 400 will not be able to cleave it. The enzyme 400 would therefore be inactivated, as the ligand 455 remains lodged in the active site 110.
Once the active site region 110 and the mechanism of action of the target receptor have been characterized, a suitable ligand must be designed. This is typically the most demanding task of the entire drug design process. The optimal combination of atoms and functional groups to complement the receptor is often the natural ligand of the receptor. This is usually an unacceptable candidate for a drug. This may be, for example, because the natural ligand is inactivated by the receptor, as described above, or because it is not feasible to commercially manufacture the natural ligand. Therefore, alternative combinations of chemical structures must be devised.
Those skilled in the art will appreciate that the design of novel ligands is often restricted by what chemists are physically able to synthesize. It is of no use to design the ultimate drug if it cannot be manufactured. Each atom type has a specific size, charge, and geometry with respect to the number and types of neighboring atoms that it can be joined to. The entire field of chemistry is predicated on the establishment of synthetic rules for the construction and manipulation of various combinations of atoms and functional groups. These chemical rules govern the design and synthesis of postulated ligand candidates. Within these rules, the drug developer must creatively propose suitable chemical structures that satisfy the requirements discussed above.
Finally, there are biological considerations to the development of new drugs. For example, the liver is the major organ of detoxification in the human body. Any drug that is taken undergoes a number of chemical reactions in the liver as the body attempts to neutralize foreign substances. This set of reactions is well characterized, and a great deal of knowledge exists as to how drugs are modified as the body eliminates them. For another, even more important example, various chemical structures are highly toxic to biological systems, and these are also well characterized. These constraints must also be taken under consideration as novel drugs are developed.
As discussed above, the development of any potential drug begins with scientific study to determine the biochemistry behind a medical problem for which pharmaceutical intervention is possible. This allows the determination of specific receptor targets that must be modulated to alter their activity in some way. Once these targets have been identified, compounds must be found that will interact with the receptors in some fashion. At this initial stage of drug development, it does not matter what effect the compounds have on the targets. The goal is simply to find anything that binds to the receptor in any fashion.
A typical drug-discovery pipeline is outlined in FIG. 5, shown generally at 500. The first step 520 is to use biological data 510 to determine an “assay” for the receptor. An assay is a chemical or biological test that turns positive when a suitable binding agent interacts with the receptor. Usually, this test is some form of colorimetric assay, in which an indicator turns a specific color when complementary ligands are present. This assay is then used in mass screening 530, which is a technique whereby hundreds of thousands of compounds can be tested in a matter of days to weeks. Typically, a pharmaceutical company will first screen their entire corporate database of known compounds. The reason is that if a successful match is found, the database compound is usually very well characterized. Furthermore, synthetic methods will be known for this compound. This enables the company to rapidly prototype a candidate ligand whose chemistry is well known.
If a successful match is found, the initial hit is called a “lead compound” 540. The lead compound 540 is usually a weakly binding ligand with minimal receptor activity. The binding of this structure to the receptor is then studied at 550 to determine the interactions that foster the ligand-receptor association. If the receptor is water soluble, there is a chance that x-ray crystallographic analysis can be employed to determine the three-dimensional structure of the ligand bound to the receptor at the atomic level. This is a very powerful tool because it allows scientists to directly visualize a snapshot of the individual atoms of the ligand as they reside within the receptor. This snapshot is referred to as the “crystal structure” of the ligand-receptor complex. Unfortunately, not all complexes can be analyzed in this manner. However, if a crystal structure can be determined, a strategy can then be developed based upon this characterization to improve and optimize the binding of the lead compound. From this point onward, a cycle of iterative chemical refinement and testing continues at 560 until a clinically active compound 570 is found. The techniques most often used to refine drugs at 560 are combinatorial chemistry and structure-based design. The clinically active compound 570 is then tested with patients in clinical trials at 580.
Combinatorial chemistry is one technique that aids in the refinement of the lead compound 540. Combinatorial chemistry is a synthetic tool that can rapidly generate many thousands of lead compound 540 derivatives for testing. A scaffold is employed that contains a portion of the ligand 540 that remains constant. Sites on the scaffold are then designated for derivatization, that is, designated for the addition of substituent functional groups from carefully designed chemical libraries, in a combinatorial fashion. As a result, a multitude of derivative structures, each with different substituent groups, may be rapidly generated in an automated fashion. For example, if a scaffold contains three derivatization sites and the library contains ten groups per site, theoretically 1000 different combinations are possible. By carefully selecting libraries based upon the study of the active site, the derivatization process can be targeted towards optimizing ligand-receptor interaction.
Structure based design (also called rational drug design), on the other hand, is much more focused than combinatorial chemistry. Biochemical laws of ligand-receptor association discussed above are used to postulate ligand refinements to improve binding. For example, as discussed above, steric complementarity is vital to tight receptor binding. Using the crystal structure of the complex, regions of the ligand that fit poorly within the active site can be identified, and chemical changes to improve complementarity with the receptor can be postulated. In a similar fashion, functional groups on the ligand can be changed in order to augment electrostatic complementarity with the receptor. However, the danger in altering any portion of the ligand is the effect on the remaining ligand structures. Modifying even a single atom in the middle of the ligand can drastically change the shape of the overall structure. Even though complementarity in one portion of the ligand might be improved by the chemical revision, the overall binding might be severely compromised. This is an important shortcoming of rational design procedures.
Computer aided drug design generally follows one of two strategies: de novo design and drug optimization. De novo design refers to construction of virtual lead compounds entirely through computer simulation. For the most part, de novo design has been unsuccessful. In order to make the calculations that simulate ligand construction and receptor-binding affinity run in a finite period of time, assumptions significant approximations, and numerous algorithmic shortcuts are generally required. This greatly diminishes the accuracy of any calculated ligand-receptor interaction. Thus, de novo design can postulate numerous chemical structures that can potentially complement the active site; however, the calculated binding affinity has little or no correlation with reality. Furthermore, de novo design often generates undesired structures, such as toxic or chemically unstable structures. Therefore, a large fraction of the potential ligands identified by de novo design are useless as a commercial drug.
Computer aided drug optimization, however, is an important tool in drug research. Drug optimization begins with a lead compound 540, which may have been identified by mass screening, through combinatorial chemistry, by x-ray crystallography, or some other means. Small modifications are then made to generate derivative compounds using structure-based design to improve binding affinity. Because the changes are relatively small the validity of the computed binding affinities of the derivatives is relatively high. The best of the derivatives can be tested to verify the accuracy of the calculated binding affinities. The process continues iteratively until satisfactory binding ligands are produced.
Prior art computer-aided drug design packages generally fall into one of three main genres: scanners, builders, and hybrids.
All database search programs fall into the scanner category. Scanner type programs are typically used for lead compound screening. FIG. 6A illustrates how these programs are used. A lead compound 540 whose binding structure has been determined resides within an active site. From biochemical analysis of the ligand-receptor interaction, the pharmacophore is determined. For example, in the lead compound 540 shown in FIG. 6A, it might be determined that three ligand groups make up the pharmacophore 620: a phenyl ring 612, an amide hydrogen 614, and a hydroxyl group 616. The pharmacophore 620 is transformed into a query 630 that specifies the three-dimensional relationship between the functional groups of the pharmacophore 620.
FIG. 6B illustrates the process by which a scanner package identifies potential new drugs. The scanner package requires a database 650 of compounds whose three-dimensional structures are known. The query 630 is then used to search the database 650 for compounds that mimic the pharmacophore 620 and can potentially bind to the receptor target. The scanner package then outputs a set of candidates 660.
Scanners have a number of advantages. In database search programs the user has complete control over the query specifications. This allows for the retrieval of structures that meet the requirements of the pharmacophore 620 and have a better opportunity to complement the receptor. Furthermore, because these programs use a database 650 of known compounds, synthetic feasibility is assured. These programs are typically highly optimized for speed, which allows for the rapid determination of potential binding ligands. Furthermore, since compounds are retrieved that mirror the query, no scoring functions that estimate receptor-binding affinity are required.
However, the scanner relies on the assumption that the three-dimensional structure stored in the database is representative of biological reality. Although this can be true of small molecules, larger structures are often too flexible for the assumption to hold true. Thus, scanners may miss important potential lead compounds that can flex to form a structure with a high binding affinity. Furthermore, scanners cannot generate new lead compounds—they are completely dependent upon the database 650 of structures with which it is supplied. Therefore, scanners cannot identify new structures, and their potential solutions are biased by the database they employ. Furthermore, generating a large database may require a great deal of manpower and funding, imposing a burden to commercial companies and potentially rendering scanner type software useless to academic institutions.
Builder-type programs may be used for de novo ligand design if a substantial portion of the ligand is modified. However, they are best used for the optimization of lead compounds. Like scanners, builder programs use a database of structures. However, a builder's database contains fragments and chemical building blocks instead of complete compounds. In order to optimize a lead compound with a builder, areas of the compound that poorly complement the corresponding receptor region must be identified, as shown in FIG. 7. The lead compound 710 contains a stable, tight-binding region 712, and a phenyl ring 714 that should be replaced to improve receptor complementarity. Builder-type programs require the attachment point of the weak-binding portion as input, shown at 716 on the example lead compound 710. The software then removes the offending ligand region and uses the attachment point 716 to create a population of derivatives by adding, deleting, and substituting fragments 718 chosen from the builder's component database to fill the active site. The binding energies of the resulting derivative ligands are then calculated. Those structures that augment binding are retained while those that do not are discarded. This process repeats as the new population of structures is then processed to generate the next round of derivatives. By making incremental changes iteratively, these programs generate a set of ligands with improved receptor complementarity over time.
Builder programs require less investment to use than scanners because the database is easier to generate. Furthermore, the component database is often built into the software itself. The combinatorial addition of fragments offers a vast number of potential derivative structures. Because components from numerous chemical classes are typically included, builder programs can automatically generate a diverse set of chemical solutions, which contributes to the creation of novel ligands. In addition, builders can also be used to optimize the hits that result from mass screening.
Unfortunately, the combinatorial attachment of such diverse chemical components also leads to the generation of synthetically unfeasible and chemically unstable structures. Also, although a diverse set of chemical building blocks is used, the manner in which they are attached is typically up to the developer of the software. Decisions such as when a particular component is selected and where it is attached greatly affect the generated structures. These choices reflect the bias of the program developers. Furthermore, the ability of builder programs to generate improved ligands is limited by the inability to accurately calculate the receptor-binding structure and binding affinity, as discussed above.
As with scanner packages, builder-type programs are also limited by their ability to deal with ligand flexibility. Builders attempt to deal with this limitation with “conformational searching.” It will be appreciated that a molecule is actually composed of rigid chemical groups separated by rotatable bonds, as defined by the laws of chemistry. These rotatable bonds give a ligand inherent flexibility, so that it can adopt numerous configurations as it attempts to bind within the active site. A snapshot structure of the ligand at any instant in time is called a “conformation,” and is defined by the set of torsion angles between rigid groups. The task of conformational searching is to determine the most complementary binding structure from all the permutations of potential shapes the ligand can assume. Because of the combinatorial nature of the problem, searching all the rotatable bond configurations a ligand can adopt is extremely demanding in terms of computer resources. Even with yearly exponential increases in processor speed, the complexity of this problem remains one of the most arduous tasks in computational chemistry.
For example, builders typically employ the “odometer” algorithm to find the best-binding shape of a ligand has several rotatable bonds that can each potentially spin 360 degrees. The odometer algorithm is a systematic sampling of all possible torsion angle combinations. Like an odometer, the first bond is fully rotated 360 degrees before the second bond is incremented. When the second bond is incremented, the first bond is reset and then fully scanned again. This continues until the second bond is fully rotated, at which time the third bond is incremented. Searching continues in this manner until all rotatable bond combinations are eventually sampled. During the conformational search, acceptable torsion angle ranges must be determined for each rotatable bond. When a rotatable bond is incremented, the atoms attached to the “swing arm” are checked against all receptor atoms and ligand groups within the vicinity. If contact exists, then that particular conformer is eliminated, so that only valid ligand conformations that conform to the active site are generated.
The combinatorial nature of this problem leads to an exponential rise in the number of conformations that must be calculated. A ligand with four rotatable bonds that is sampled at ten-degree increments requires evaluation of 1,679,616 different conformations. A ligand with five rotatable bonds sampled at ten-degree increments requires evaluation of over 60 million conformations. A search at 10 degree increments is relatively coarse, and may well overlook crucial ligand conformers that optimally interact with the receptor, especially if the active site is convoluted or forces the ligand to adopt a particular conformation when bound. Ideally, sampling at sub-1-degree increments would ensure a more thorough search. In addition, allowing receptor side-chain flexibility and backbone motion would enable the determination of optimal ligand binding modes.
Since drugs typically contain 10-15 rotatable bonds, conformational searching can easily overwhelm even the fastest computers, despite the development of algorithms that reduce the computational burden of conformational searching by orders of magnitude. Consequently, some builder packages do not implement conformational flexibility at all, or use other short-cuts that severely limit their ability to determine adequate ligand binding conformations. Others use rudimentary, pre-calculated torsion angle scans that lack the resolution to tightly dock compounds within the active site.
Hybrid programs are typically employed in de novo ligand generation. FIG. 8 illustrates the operation of a typical hybrid program. A given active site 810 has three distinct regions 812, 814, and 816. The goal of the hybrid program is to generate a complete ligand that complements the active site 810. To do so, the program employs a combination of scanner and builder algorithms. The program first utilizes a scanner strategy to find components that will complement individual subsites within the active site 810 volume, such as the sets of components 822, 824, and 826 shown in FIG. 8. Individual components are then docked into their respective regions within the active site, as, for example, shown in FIG. 8 with the components 832, 834, and 836. Splicing fragments are then used to join the individual components 832, 834, and 836 into one or more complete ligands 850. Because numerous possible fragments may exist that complement the various active site regions a potentially large number of ligands may be generated by combinatorially linking the various components.
The strength of hybrid programs is in their ability to generate a large number of diverse potential hits. However, they suffer the same shortcomings as all de novo design packages described above: their performance is restricted by the inability to accurately calculate ligand-receptor binding affinity; the combinatorial nature of the algorithm often leads to the generation of chemical structures that violate the laws of physics, are unstable, or are synthetically difficult; and the developer of the software may bias the generation of compounds.
As discussed above, molecular mechanics equations quantitatively determine the potential energy of a ligand-receptor system as a function of its atomic coordinates. Using both derivative and non-derivative approaches, minimization algorithms can be applied to these equations to identify system geometries that correspond to minimum points on the energy surface. The potential energy of a chemical system can be thought of as a multi-dimensional surface, where each dimension represents a quantified measurable parameter—atomic coordinates, bond lengths, torsion angles, and so on. Any grid point on this surface represents the potential energy of a specific set of ligand-receptor atom coordinates. Minimization can be used to mathematically determine the lowest local point “downhill” on the energy surface from the original coordinates. This point should correspond to the coordinates of the nearest local minimum.
The use of minimization to improve sampling efficiency can be understood by reference to a simple, four-atom system shown in FIG. 16A, which has a single rotatable bond. Torsion angles of 0° and 360° give the highest potential energy, since the terminal carbons are aligned. Conversely, when the torsion angle is 180°, the internal energy is minimal, as the terminal carbons are maximally separated. The energy profile for this single rotatable bond is thus a “U”-shaped well, with the energy minimally situated at its lowest point, as shown in FIG. 16B.
Without minimization, a conformational search must employ very small angle increments to ensure that the bottom of the well is thoroughly explored. The multiple arrows in FIG. 16B illustrate this. However, if minimization is employed, a much coarser search can be used, since the minimum energy conformer can be readily determined from any structure that falls within the well. As such, the combinatorial impact of the conformer search can be lessened while improving its accuracy. The importance of this is amplified when considering the myriad poses a ligand can potentially adopt within the active site.
Use of minimization involves some unique complexities. For any system with N atoms, the energy is a function of 3N Cartesian coordinates. Even the most basic force fields contain terms describing bond-stretching, angle-bending, torsion, non-bonded, and Coulomb potential energies. Calculating derivatives for all the energy terms and locating minima numerically is CPU intensive in its own right. In addition, minimizing the receptor adds typically ten-fold more atoms to manage. As such, even the fastest of commercially available minimizers currently requires several minutes to relax a typical ligand-receptor structure. This nullifies any computational advantage gained from employing a rough conformational search. For most real-time applications, including docking, automated ligand refinement, and virtual screening, this timeframe is impractical.
Thus, an improved system and method for computer-aided drug design is needed, in which rapid minimization is used to improve adaptive sampling.