During the course of drug discovery, optimization and development researchers are faced with many important decisions. These decisions frequently determine the success or failure of the drug discovery and optimization program, or at the very least directly affect the number and duration of optimization cycles necessary to develop a treatment for the target disease.
Deciding which lead series of chemical molecules to optimize is of critical importance, complicated by the fact that modern high throughput screening often places 5-10 such options before the discovery team. Decisions are made within a lead series about which molecules deserve further optimization, and about which molecules receive detailed in vivo evaluation. Several of these decisions commit dozens of research workers to experimental programs that may last 1-3 years, so any suboptimal decision can easily result in substantial financial and opportunity costs. Further, in many cases patents will have already been filed and/or granted covering the molecules about which the decisions are being made, and poor decisions can delay a drug's arrival on the market by several years, and thus the length of the effective patent protection for the drug. Thus, an erroneous selection of a compound for final optimization or for detailed in vivo examination can be a very costly error. These key decisions are significant and deserve to be supported with detailed, accurate, and topical experimental findings directed toward fully understanding a candidate's toxicology and mechanism of action.
In current practice many of these critical decisions are made based on an experienced drug developer's best judgment of likely toxicological and off-target effect profiles. This judgment and intuition can be improved and supplemented by chemogenomic annotation of the candidate and comparison of the candidate's profile with a large database of chemogenomic annotations.
“Chemogenomic annotation” is the process of determining the transcriptional and pharmacological response of one or more genes to exposure to a particular chemical, and defining and interpreting such responses in terms of the classes of chemicals for which they interact. A comprehensive library of chemogenomic annotations would enable one to design and optimize new pharmaceutical lead compounds based on the probable transcriptional and biomolecular profile of a hypothetical compound with certain characteristics. Additionally, one can use chemogenomic annotations to determine relationships between genes (for example, as members of a signaling pathway or protein-protein interaction pair), and to aid in, determining the causes of side effects and the like. Finally, presenting the drug design researcher with a body of chemogenomic annotation information will generate research hypotheses that will stimulate follow-on experiments and may stimulate changes in the researcher's drug design plan, including development or inclusion of additional counter screens, or may stimulate the selection and elaboration of an alternate lead series, which is revealed by the chemogenomic library to have preferable characteristics.
Several genomic database models have been disclosed. Sabatini et al., U.S. Pat. No. 5,966,712 disclosed a database and system for storing, comparing and analyzing genomic data. Maslyn et al., U.S. Pat. No. 5,953,727 disclosed a relational database for storing genomic data. Kohler et al., U.S. Pat. No. 5,523,208 disclosed a database and method for comparing polynucleotide sequences and the predicted functions of their encoded proteins. Fujiyama et al., U.S. Pat. No. 5,706,498 disclosed a database and retrieval system, for identifying genes of similar sequence.
Sabry et al., WO00/70528 disclosed methods for analyzing compounds for drug discovery using a cellular informatics database. The system photographs cells that have been manipulated or exposed to test compounds, converting the resulting data into a database. Sabry further describes constructing a database of “cellular fingerprints” comprising descriptors of cell-compound interactions, where the descriptors are a collection of identified data/phenotype variations that characterize the interaction with compounds of known action, constructing a phylogenetic tree from the descriptors, and determining the statistical significance of each descriptor. The descriptor for a new compound can be compared to the phylogenetic tree to determine its most likely mode of action.
Winslow et al., WO00/65523, disclosed a system comprising a database containing biological information which is used to generate a data structure having at least one associated attribute, a user interface, an equation generation engine operative to generate at least one mathematical equation from at least one hierarchical description, and a computational engine operative on the mathematical equation to model dynamic subcellular and cellular behavior. The system is intended to access and tabulate genetic information contained within proprietary and nonproprietary databases, combine that data with functional information regarding the biochemical and biophysical role of gene products, and based on this information formulate, solve and analyze computational models of genetic, biochemical and biophysical processes within cells.
Gould-Rothberg et al., WO00/63435, disclosed a method for identifying hepatotoxic agents by exposing a test cell population comprising a cell capable of expressing one or more nucleic acids sequences responsive to troglitazone (an anti-diabetes drug discovered to cause liver damage in some patients during phase III trials), contacting the test cell population with the test agent and comparing the expression of the nucleic acids sequences in a reference cell population. An alteration in expression of the nucleic acids sequences in the test cell population compared to the expression of the gene in the reference cell population indicates that the agent is hepatotoxic. Gould-Rothberg et al., WO00/37685, disclosed a method for identifying psychoactive agents that lack motor involvement, by identifying genes transcriptionally activated in rat brain striatum in response to haloperidol. Compounds that do not induce these genes are believed to not result in side effects.
Friend et al., U.S. Pat. No. 6,203,987, discloses a method for comparing array profiles by grouping genes into co-regulated sets (“genesets”). Friend et al. disclose an embodiment in which the expression profile obtained in response to a drug is projected into a geneset, and compared with other genesets to determine the biological pathways affected by the drug. In another embodiment, the projected profiles of drug candidates are compared with the profiles of known drugs to identify possible replacements for existing drugs.
Tamayo et al., EP 1037158, disclosed a method for organizing genomic data using Self Organizing Maps to cluster gene expression data into similar sets. The method was assumed to identify drug targets, by identifying which move from their expression clusters after a test cell is exposed to a given compound.
Tryon et al., WO01/25473, disclosed a method for constructing expression profiles of genes in response to a drug. In this method, a number of genes are selected on the basis of their expected interaction with the drug or condition to be examined, and their expression in cell culture is measured in response to administration of the drug.
The level of total cholesterol in serum can be modulated by a variety of treatments. Treatments with fibrates activate the PPARα nuclear receptor. In rats, the activated receptor is a transcription factor that induces the fatty acid beta oxidation (FABO) and the peroxisome proliferation pathways. The mechanism by which fibrates lower LDL or raise HDL in humans is unclear (Grundy et al., Am J Med (1987) 83:9-20). The mechanism probably involves activation of PPARα as in rats (Kersten et al, Nature (2000) 405:421-24). Some of the downstream effects of PPARα activation are increased fatty acid oxidation, increased expression of apoA-I and apoA-II resulting in increased levels of HDL (Stael and Auwerx, Atherosclerosis (1998) 137:S19-S23) a decrease in LDL mediated by an increase in lipoprotein lipase, a decrease in apoC-III, and possibly an increase in SREBP. The mechanism of action of statins is inhibition of HMGCoA reductase the key regulated step in cholesterol biosynthesis (Alberts et al., Proc Natl Acad Sci USA (1980) 77:3957-61). Low levels of endogenous cholesterol in hepatic cells results in induction of the LDL receptor, and the larger number of LDL receptors on the surface of hepatocytes results in increased removal of LDL from the blood (Bilheimer et al., Proc Natl Acad Sci USA (1983) 80:4124-28). Estrogenic compounds are also known to lower blood cholesterol. Premenopausal women have a lower risks of cardiovascular disease due in part to lower cholesterol levels resulting from an increase in LDL catabolism (Walsh et al., N Engl J. Med. (1991) 325(17):1196-204, Bush et al., Circulation (1987) 75(6):1102-09). Other cardioprotective effects of estrogen involve the induction of nitric oxidase synthase (Holm et al., J Clin Invest (1997) 100:821-28). A large family of drugs known as selective estrogen response modulators (SERM), raloxifene for example, have also been shown to lower cholesterol in ovarectomized female rats (Kauffman et al., J. Pharmacol. Exp. Ther. (1997) 280:146-53).
Given that we have assembled a large gene expression database in rats treated with multiple members of all these families of compounds we have investigated the possibility of generating a gene expression signature predictive of a reduction of total blood cholesterol.