An explanation of conventional drug discovery processes and their limitations is useful for understanding the present invention.
Discovering a new drug to treat or cure some biological condition, is a lengthy and expensive process, typically taking on average 12 years and $800 million per drug, and taking possibly up to 15 years or more and $1 billion to complete in some cases.
A goal of a drug discovery process is to identify and characterize a chemical compound or ligand biomolecule, i.e., binder that affects the function of one or more other biomolecules (i.e., a drug “target”) in an organism, usually a biopolymer, via a potential molecular interaction or combination. Herein the term biopolymer refers to a macromolecule that comprises one or more of a protein, nucleic acid (DNA or RNA), peptide or nucleotide sequence or any portions or fragments thereof. Herein the term biomolecule refers to a chemical entity that comprises one or more of a biopolymer, carbohydrate, hormone, or other molecule or chemical compound, either inorganic or organic, including, but not limited to, synthetic, medicinal, drug-like, or natural compounds, or any portions or fragments thereof.
The target molecule is typically what is known as a disease-related target protein or nucleic acid for which it is desired to affect a change in function, structure, and/or chemical activity in order to aid in the treatment of a patient disease or other disorder. In other cases, the target is a biomolecule found in a disease-causing organism, such as a virus, bacteria, or parasite, that when affected by the drug will affect the survival or activity of the infectious organism. In yet other cases, the target is a biomolecule of a defective or harmful cell such as a cancer cell. In yet other cases the target is an antigen or other environmental chemical agent that may induce an allergic reaction or other undesired immunological or biological response.
The ligand is typically a small molecule drug or chemical compound with desired drug-like properties in terms of potency, low toxicity, membrane permeability, solubility, chemical/metabolic stability, etc. In other cases, the ligand may be biologic such as an injected protein-based or peptide-based drug or even another full-fledged protein. In yet other cases the ligand may be a chemical substrate of a target enzyme. The ligand may even be covalently bound to the target or may in fact be a portion of the protein, e.g., protein secondary structure component, protein domain containing or near an active site, protein subunit of an appropriate protein quaternary structure, etc.
Throughout the remainder of the background discussion, unless otherwise specifically differentiated, a (potential) molecular combination will feature one ligand and one target, the ligand and target will be separate chemical entities, and the ligand will be assumed to be a chemical compound while the target will typically be a biological protein (mutant or wild type). Note that the frequency of nucleic acids (both DNA/RNA) as targets will likely increase in coming years as advances in gene therapy and pathogenic microbiology progress. Also the term “molecular complex” will refer to the bound state between the target and ligand when interacting with one another in the midst of a suitable (often aqueous) environment. A “potential” molecular complex refers to a bound state that may occur albeit with low probability and therefore may or may not actually form under normal conditions.
The drug discovery process itself typically includes four different subprocesses: (1) target validation; (2) lead generation/optimization; (3) preclinical testing; and (4) clinical trials and approval.
Target validation includes determination of one or more targets that have disease relevance and usually takes two-and-a-half years to complete. Results of the target validation phase might include a determination that the presence or action of the target molecule in an organism causes or influences some effect that initiates, exacerbates, or contributes to a disease for which a cure or treatment is sought. In some cases a natural binder or substrate for the target may also be determined via experimental methods.
Lead generation typically involves the identification of lead compounds, i.e., ligands, that can bind to the target molecule and that may alter the effects of the target through either activation, deactivation, catalysis, or inhibition of the function of the target, in which case the lead would be a viewed as a suitable candidate ligand to be used in the drug application process. Lead optimization involves the chemical and structural refinement of lead candidates into drug precursors in order to improve binding affinity to the desired target, increase selectivity, and also to address basic issues of toxicity, solubility, and metabolism. Together lead generation and lead optimization typically take about three years to complete and might result in one or more chemically distinct leads for further consideration.
In preclinical testing, biochemical assays and animal models are used to test the selected leads for various pharmacokinetic factors related to drug absorption, distribution, metabolism, excretion, toxicity, side effects, and required dosages. This preclinical testing takes approximately one year. After the preclinical testing period, clinical trials and approval take another six to eight or more years during which the drug candidates are tested on human subjects for safety and efficacy.
Rational drug design generally uses structural information about drug targets (structure-based) and/or their natural ligands (ligand-based) as a basis for the design of effective lead candidate generation and optimization. Structure-based rational drug design generally utilizes a three-dimensional model of the structure for the target. For target proteins or nucleic acids such structures may be as the result of X-ray crystallography/NMR or other measurement procedures or may result from homology modeling, analysis of protein motifs and conserved domains, and/or computational modeling of protein folding or the nucleic acid equivalent. Model-built structures are often all that is available when considering many membrane-associated target proteins, e.g., GPCRs and ion channels. The structure of a ligand may be generated in a similar manner or may instead be constructed ab initio from a known 2-D chemical representation using fundamental physics and chemistry principles, provided the ligand is not a biopolymer.
Rational drug design may incorporate the use of any of a number of computational components ranging from computational modeling of target-ligand molecular interactions and combinations to lead optimization to computational prediction of desired drug-like biological properties. The use of computational modeling in the context of rational drug design has been largely motivated by a desire both to reduce the required time and to improve the focus and efficiency of drug research and development, by avoiding often time consuming and costly efforts in biological “wet” lab testing and the like.
Computational modeling of target-ligand molecular combinations in the context of lead generation may involve the large-scale in-silico screening of compound libraries (i.e., library screening), whether the libraries are virtually generated and stored as one or more compound structural databases or constructed via combinatorial chemistry and organic synthesis, using computational methods to rank a selected subset of ligands based on computational prediction of bioactivity (or an equivalent measure) with respect to the intended target molecule.
Throughout the text, the term “binding mode” refers to the 3-D molecular structure of a potential molecular complex in a bound state at or near a minimum of the binding energy (i.e., maximum of the binding affinity), where the term “binding energy” (sometimes interchanged with “binding free energy” or with its conceptually antipodal counterpart “binding affinity”) refers to the change in free energy of a molecular system upon formation of a potential molecular complex, i.e., the transition from an unbound to a (potential) bound state for the ligand and target. The term “system pose” is also sometimes used to refer to the binding mode. Here the term free energy generally refers to both enthalpic and entropic effects as the result of physical interactions between the constituent atoms and bonds of the molecules between themselves (i.e., both intermolecular and intramolecular interactions) and with their surrounding environment. Examples of the free energy are the Gibbs free energy encountered in the canonical or grand canonical ensembles of equilibrium statistical mechanics.
In general, the optimal binding free energy of a given target-ligand pair directly correlates to the likelihood of combination or formation of a potential molecular complex between the two molecules in chemical equilibrium, though, in truth, the binding free energy describes an ensemble of (putative) complexed structures and not one single binding mode. However, in computational modeling it is usually assumed that the change in free energy is dominated by a single structure corresponding to a minimal energy. This is certainly true for tight binders (pK˜0.1 to 10 nanomolar) but questionable for weak ones (pK˜10 to 100 micromolar). The dominating structure is usually taken to be the binding mode. In some cases it may be necessary to consider more than one alternative binding mode when the associated system states are nearly degenerate in terms of energy.
Binding affinity is of direct interest to drug discovery and rational drug design because the interaction of two molecules, such as a protein that is part of a biological process or pathway and a drug candidate sought for targeting a modification of the biological process or pathway, often helps indicate how well the drug candidate will serve its purpose. Furthermore, where the binding mode is determinable, the action of the drug on the target can be better understood. Such understanding may be useful when, for example, it is desirable to further modify one or more characteristics of the ligand so as to improve its potency (with respect to the target), binding specificity (with respect to other target biopolymers), or other chemical and metabolic properties.
A number of laboratory methods exist for measuring or estimating affinity between a target molecule and a ligand. Often the target might be first isolated and then mixed with the ligand in vitro and the molecular interaction assessed experimentally such as in the myriad biochemical and functional assays associated with high throughput screening. However, such methods are most useful where the target is simple to isolate, the ligand is simple to manufacture and the molecular interaction easily measured, but is more problematic when the target cannot be easily isolated, isolation interferes with the biological process or disease pathway, the ligand is difficult to synthesize in sufficient quantity, or where the particular target or ligand is not well characterized ahead of time. In the latter case, many thousands or millions of experiments might be needed for all possible combinations of the target and ligands, making the use of laboratory methods unfeasible.
While a number of attempts have been made to resolve this bottleneck by first using specialized knowledge of various chemical and biological properties of the target (or even related targets such as protein family members) and/or one or more already known natural binders or substrates to the target, to reduce the number of combinations required for lab processing, this is still impractical and too expensive in most cases. Instead of actually combining molecules in a laboratory setting and measuring experimental results, another approach is to use computers to simulate or characterize molecular interactions between two or more molecules (i.e., molecular combinations modeled in silico). The use of computational methods to assess molecular combinations and interactions is usually associated with one or more stages of rational drug design, whether structure-based, ligand-based, or both.
When computationally modeling the nature and/or likelihood of a potential molecular combination for a given target-ligand pair, the actual computational prediction of binding mode and affinity is customarily accomplished in two parts: (a) “docking”, in which the computational system attempts to predict the optimal binding mode for the ligand and the target and (b) “scoring”, in which the computational system attempts to refine the estimate of the binding affinity associated with the computed binding mode. During library screening, scoring may also be used to predict a relative binding affinity for one ligand vs. another ligand with respect to the target molecule and thereby rank prioritize the ligands or assign a probability for binding.
Docking may involve a search or function optimization algorithm, whether deterministic or stochastic in nature, with the intent to find one or more system poses that have favorable affinity. Scoring may involve a more refined estimation of an affinity function, where the affinity is represented in terms of a combination of one or more empirical, molecular-mechanics-based, quantum mechanics-based, or knowledge-based expressions, i.e., a scoring function. Individuals scoring functions may themselves be combined to form a more robust consensus-scoring scheme using a variety of formulations. In practice, there are many different docking strategies and scoring schemes employed in the context of today's computational drug design.
Whatever the choice of computational method there are inherent trade-offs between the computational complexity of both the underlying molecular models and the intrinsic numerical algorithms, and the amount of computing resources (time, number of CPUs, number of simulations) that must be allocated to process each molecular combination. For example, while highly sophisticated molecular dynamics simulations (MD) of the two molecules surrounded by explicit water molecules and evolved over trillions of time steps may lead to higher accuracy in modeling the potential molecular combination, the resultant computational cost (i.e., time and computing power) is so enormous that such simulations are intractable for use with more than just a few molecular combinations. On the other hand, the use of more primitive models for representing molecular interactions, in conjunction with multiple, and often error-prone, modeling shortcuts and approximations, may result in more acceptable computational cost but will invariably cause significant performance degradation in terms of modeling accuracy and predictive power. Currently, even the process of checking a library of drug candidates against a target protein takes too long for the required accuracy using current computational systems.
In general, the present invention relates to the efficient and accurate determination or characterization of molecular interactions via computational methods. Here the determination or characterization of molecular interactions (of which computational docking and scoring methods are only a subset) may involve the prediction of likelihood of formation of a potential molecular complex, the estimation of the binding affinity or binding energy of two (or more) molecules, the prediction of the binding mode (or even additional alternative modes) for the target-ligand pair, or the rank prioritization of a set of ligands based on predicted bioactivity with the target molecule. Throughout the remainder of the text, the binding affinity (or equivalent) will in general be modeled as an objective mathematical function (i.e., an ‘affinity’ function) that approximately characterizes the underlying physics and chemistry of the appropriate molecular interactions between the target and ligand molecules, though other possible embodiments exist (some of which will be discussed in the detailed description) wherein the affinity function may be one of a variety of qualitative or quantitative measures associated with the molecular interactions.
In summary, it is desirable in the drug discovery process to identify quickly and efficiently the optimal states or configurations, i.e., binding modes and binding energy, of two molecules or parts of molecules. Efficiency is especially relevant in the lead generation and lead optimization stages for a drug discovery pipeline, where it may be desirable to accurately predict the binding mode and binding affinity for possibly millions of potential target-ligand molecular combinations, before submitting promising candidates to further analysis. There is a clear need then to have more efficient systems and methods for computational modeling of the molecular combinations with reasonable accuracy.