1. Field of the Invention
The invention is generally related to methods for the prediction of the behavior of molecules, including the querying for compounds that have multiple or similar properties, and in particular to pharmacophore analysis and the generation and mining of pharmacophore databases for drug definition and repurposing.
2. Description of the Related Art
Despite steady and significant increases in research and development spending, the number of new drug applications and approvals has been, at best, flat. The low productivity of current target-driven approaches to drug discovery has been ascribed to a number of reasons including limited focus to a single target, and undesirable effects such as toxicity and low efficacy that are discovered too late in the discovery process (see, e.g. Sams-Dodd, F. “Target-based drug discovery: is something wrong?” Drug Discov. Today 2005, 10:139-147). As a result, current interest is shifting towards evaluating biological properties at the onset, and attempting to gain a global understanding of the binding activity between compounds and targets (see, e.g. Jenkins, J. L., et al. “In silico target fishing: Predicting biological targets from chemical structure.” Drug Discov. Today: Technol. 2006, 3: 413-421; Rishton, G. M. “Reactive compounds and in vitro false positives in HTS.” Drug Discov. Today 1997, 2: 382-384).
There have been a number of attempts to understand the relationship between drug chemical structures and target proteins. In one such study, Yamanishi et al. (see, e.g. Yamanishi, Y., et al. “Prediction of drug-target interaction networks from the integration of chemical and genomic spaces.” Bioinformatics 2008, 24: 232-240) develop a supervised method to infer unknown drug-target interactions by integrating chemical space and genomic space. The authors make predictions for four classes of important drug-target interactions involving enzymes, ion channels, GPCRs, and nuclear receptors. The method measures chemical similarity in the graph domain by considering the size of the largest common subgraph between two compounds. Keiser et al. (see, e.g. Keiser, M. J., et al. “Relating protein pharmacology by ligand chemistry.” Nat. Biotechnol. 2007, 25: 197-206) compare protein families based on the chemical structure (Tanimoto coefficient) of the sets of ligands that bind to them. Yildirim et al. (see, e.g. Yildirim, M. A., et al. “Drug-target network.” Nat. Biotechnol. 2007, 25: 1119-1126) synthesize a global drug-target network consisting of different protein classes with a bipartite graph representation, but the authors do not use the chemical structure information in this analysis.
A number of computational approaches have also been developed to analyze and predict compound-protein interactions. A commonly used method is docking (see, e.g. Cheng, A. C., et al. “Structure-based maximal affinity model predicts small-molecule druggability.” Nat. Biotechnol. 2007, 25: 71-75; Rarey, M. “A fast flexible docking method using an incremental construction algorithm.” J. Mol. Biol. 1996, 261: 470-489). However, docking requires 3D structures of proteins, and so cannot be used on a large scale. Wale and Karypis (see, e.g. Wale, N., et al. “Target Fishing for Chemical Compounds Using Target-Ligand Activity Data and Ranking Based Methods.” J. Chem. Inf. Model. 2009, 49: 2190-2201) develop a technique for “target fishing” (finding all possible targets for a given compound) by analyzing the target-ligand activity matrix using Support Vector Machines (SVM) and perceptrons. Here, each chemical compound is represented by a frequency vector of topological descriptors. Other techniques for such prediction have used nearest-neighbors (see, e.g. Nettles, J. H. “Bridging chemical and biological space: ‘target fishing’ using 2D and 3D molecular descriptors.” J. Med. Chem. 2006, 49: 6802-6810), Bayesian models (see, e.g. Nidhi, et al. “Prediction of Biological Targets for Compounds Using Multiple-Category Bayesian Models Trained on Chemogenomics Databases.” J. Chem. Inf. Model. 2006, 46: 1124-1133), and neural networks (see, e.g. Niwa, T. “Prediction of biological targets using probabilistic neural networks and atom-type descriptors.” J. Med. Chem. 2004, 47: 2645-2650).
Closer to drug discovery, structure-activity relationships (SAR) have been used to guide the iterative optimization of drug leads. Recently, scientists have focused on improving SAR models by considering additional information besides the known ligands to the target under consideration. These approaches include an iterative SVM where training examples at the decision boundary are added to the training set (see, e.g. Warmuth, M. K., et al. “Active learning with support vector machines in the drug discovery process.” J. Chem. Inf. Comput. Sci. 2003, 43: 2003), and techniques that refine the SAR score using neighboring protein-ligand pairs in the joint space (see, e.g. Klabunde, T. “Chemogenomic approaches to drug discovery: similar receptors bind similar ligands.” Br. J. Pharmacol. 2007, 152: 5-7; Jacob, L., et al. “Protein-ligand interaction prediction: an improved chemogenomics approach.” Bioinformatics 2008, 24: 2149-2156). The latter group of Chemogenomics techniques differ from each other based on the descriptors they use for representing the target, ligand, or the complex or the machine learning method used for prediction (see, e.g. Bock, J. R. “Virtual Screen for Ligands of Orphan G Protein-Coupled Receptors. J. Chem. Inf. Model. 2005, 45: 1402-1414; Deng, Z.; et al. “Structural Interaction Fingerprint (SIFt):A Novel Method for Analyzing Three-Dimensional Protein-Ligand Binding Interactions.” J. Med. Chem. 2004, 47: 337-344; Erhan, D.; et al. “Collaborative Filtering on a Family of Biological Targets.” J. Chem. Inf. Model. 2006, 46: 626-635: Geppert, H.; et al. “Ligand Prediction from Protein Sequence and Small Molecule Information Using Support Vector Machines and Fingerprint Descriptors.” J. Chem. Inf. Model. 2009, 49: 767-779; Lapinsh, M.; et al. “Improved approach for proteochemometrics modeling: application to organic compound-amine G protein-coupled receptor interactions.” Bioinformatics 2005, 21: 4289-4296; Lindstrom, A., et al. “Hierarchical PLS Modeling for Predicting the Binding of a Comprehensive Set of Structurally Diverse Protein-Ligand Complexes.” J. Chem. Inf. Model. 2006, 46: 1154-1167; Ning, X., et al. “Multi-Assay-Based Structure-Activity Relationship Models: Improving Structure-Activity Relationship Models by Incorporating Activity Information from Related Targets. J. Chem. Inf. Model.” 2009, 49: 2444-2456; Strombergsson, H., et al. “Interaction Model Based on Local Protein Substructures Generalizes to the Entire Structural Enzyme-Ligand Space.” J. Chem. Inf. Model. 2008, 48: 2278-2288; Weill, N., et al. “Development and Validation of a Novel Protein-Ligand Fingerprint To Mine Chemogenomic Space: Application to G Protein-Coupled Receptors and Their Ligands.” J. Chem. Inf. Model. 2009, 49: 1049-1062). This thread of research again considers global information.
Pharmacophore based screening has also witnessed significant activity in computer aided drug design. A pharmacophore is a spatial arrangement of chemical features that defines a pattern essential for biological activity. Chemical features taken into account in defining pharmacophores usually include hydrogen bond donor/acceptor, charge, hydrophobicity and aromacity. The geometry of the arrangements of pharmacophores is responsible for binding between compounds and targets as well as properties of compounds such as Blood Brain Barrier (BBB) permeability (see, e.g. Zhao, Y. H., et al. “Predicting penetration across the blood-brain barrier from simple descriptors and fragmentation schemes.” J. Chem. Inf. Model. 2007, 47 : 170-175) and toxicity. A number of excellent tools including Phase (see, e.g. Dixon, S. L., et al. “PHASE: a novel approach to pharmacophore modeling and 3D database searching.” Chem. Biol. Drug Des. 2006, 67: 370-372; Dixon, S. L., et al. “PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results.” J. Comput. Aided Mol. Des. 2006, 20: 647-671), Catalyst (see, e.g. Kurogi, Y., et al. “Pharmacophore modeling and three-dimensional database searching for drug design using catalyst.” Curr. Med. Chem. 2001, 8: 1035-1055; Guner, O., et al. “Pharmacophore modeling and three dimensional database searching for drug design using catalyst: recent advances.” Curr. Med. Chem. 2004, 11: 2991-3005), LigandScout (see, e.g. Wolber, G., et. al. “LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters.” J. Chem. Inf. Model. 2005, 45: 160-169), and MOE (see, e.g. Molecular Operating Environment (MOE), www.chemcomp.com/index.htm) are available for discovering pharmacophores based on a set of actives (and inactives) against a target (usually with an unknown structure), and searching a database for compounds matching the pharmacophore.
However, existing pharmacophore based techniques suffer from two key weaknesses. First, they are able to analyze compounds only on a target-by-target basis, aimed at extracting and optimizing a specific pharmacophore. Such an approach is limited in terms of the search space it can investigate in the drug discovery process. Often, multiple pharmacophoric targets need to be analyzed in search for drugs against diseases such as cancer or AIDS. Second, majority of the pharmacophore based querying and searching techniques assume that some knowledge is available on the geometric properties of the binding pockets in the receptors [Brint, A. T., Willett, P., J. Mol. Graph., 5:49, 1987.], [Alladin.], [Jakes, S. E., and Willett, P., J. Mol. Graph., 4:12, 1986.], [Sheridan, K. P. et al., J. Chem. Inf. Comp. Sci., 29:255, 1989.], [Kuntz, I. D., et al., J. Md. Bid, 161: 269, 1982.], [Des Jarlais et al., J. Med. Chem., 29: 2149, 1986.] Based on this knowledge, 3D databases of molecular conformations are scanned to identify potential ligands. Screening of molecular databases is typically done based on some underlying model such as the lock-and-key mechanism. The lock-and-key model assumes that for a molecule to be active its steric characteristics should perfectly complement the shape of the receptor. What is critical to the quality of the prediction is the accuracy of the underlying binding model and the assumptions on the geometries of the binding pockets. Furthermore, if gathering information on the binding pockets is expensive in terms of time or cost, then the utility of the entire searching pipeline is hampered.
The proposed technique answers both of these weaknesses. To drastically increase the search space, the unique concept of the joint pharmacophore space is first defined. The joint pharmacophore space is a database of pharmacophores based on the geometric arrangements of pharmacophoric features of both the actives and inactives against a higher level biological goal. In our technique, this space is directly mined to understand diversity, binding affinities, and biological properties of the actives against a particular disease. Our technique does not assume any knowledge on the geometries of the binding pockets or depend on any underlying binding model. Rather, these geometries are learned from the pharmacophoric space of the training set as long as the set of compounds change in a consistent way while binding to protein targets.