2.1 Quantitative Measurement of Cellular Constituents
There is currently an explosive increase in the generation of quantitative measurements of the levels of “cellular constituents”. Cellular constituents include gene expression levels, abundance of mRNA encoding specific genes, and protein expression levels in a biological system. Levels of various constituents of a cell, such as mRNA encoding genes and/or protein expression levels, are known to change in response to drug treatments and other perturbations of the cell's biological state. Measurements of a plurality of such “cellular constituents” therefore contain a wealth of information about the affect of perturbations on the cell's biological state. The collection of such measurements is generally referred to as the “profile” of the cell's biological state.
There may be on the order of 100,000 different cellular constituents for mammalian cells. Consequently, the profile of a particular cell is typically complex. The profile of any given state of a biological system is often measured after the biological system has been subjected to a perturbation. Such perturbations include experimental or environmental conditions(s) associated with a biological system such as exposure of the system to a drug candidate, the introduction of an exogenous gene, the deletion of a gene from the system, or changes in culture conditions. Comprehensive measurements of cellular constituents, or profiles of gene and protein expression and their response to perturbations in the cell, therefore have a wide range of utility including the ability to compare and understand the effects of drugs, diagnose disease, and optimize patient drug regimens. In addition, they have further application in basic life science research.
Within the past decade, several technological advances have made it possible to accurately measure cellular constituents and therefore derive profiles. For example, new techniques provide the ability to monitor the expression level of a large number of transcripts at any one time (see, e.g., Schena et al., 1995, Quantitative monitoring of gene expression patterns with a complementary DNA micro-array, Science 270: 467-470; Lockhart et al., 1996, Expression monitoring by hybridization to high-density oligonucleotide arrays, Nature Biotechnology 14: 1675-1680; Blanchard et al., 1996, Sequence to array: Probing the genome's secrets, Nature Biotechnology 14, 1649; U.S. Pat. No. 5,569,588, issued Oct. 29, 1996 to Ashby et al. entitled “Methods for Drug Screening”). In organisms for which the complete genome is known, it is possible to analyze the transcripts of all genes within the cell. With other organisms, such as humans, for which there is an increasing knowledge of the genome, it is possible to simultaneously monitor large numbers of the genes within the cell.
In another front, the direct measurement of protein abundance has been improved by the use of microcolumm reversed-phase liquid chromatography electrospray ionization tandem mass spectrometry (LC/MS/MS) to directly identify proteins contained in mixtures. This technology promises to push the dynamic range for which protein abundance can be measured in a biological system. Using LC/MS/MS, McCormack et al. have demonstrated that proteins presented in system mixtures can be readily identified with a 30-fold difference in molar quantity, that the identifications are reproducible, and that proteins within the mixture can be identified at low femtomole levels. McCormack et al., 1997, Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level, Anal. Chem. 69: 767-776. In a review of tandem mass spectrometry, Chait points out that an additional advantage of this technology is that it is orders of magnitude faster than more conventional approaches such as Edman sequencing. Chait, 1996, Trawling for proteins in the post-genome era, Nat. Biotech. 14: 1544.
Other technological advances have provided for the ability to specifically perturb biological systems with individual genetic mutations. For example, Mortensen et al. describe a method for producing embryonic stem (ES) cell lines whereby both alleles are inactivated by homologous recombination. Using the methods of Mortensen et al., it is possible to obtain homozygous mutationally altered cells, i.e., double knockouts of ES cell lines. Mortensen et al. propose that their method may be generally applicable to other genes and to cell lines other than ES cells. M Mortensen et al. 1992, Production of homozygous mutant ES cells with a single targeting construct, Cell Biol. 12: 2391-2395.
In another promising technology Wach et al. provide a dominant resistance module for selection of S. cerevisiae transformants which entirely consists of heterologous DNA. The module can also be used to provide PCR based gene disruptions. Wach et al., 1994, New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae, Yeast 10: 1793-808.
Technological advances, such as the use of microarrays, are already being used in rug discovery (See e.g. Marton et al., 1998, Drug target validation and identification of secondary drug target effects using Microarrays, Nature Medicine in press; Gray et al., 1998, Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors, Science 281: 533-538).
Comparison of profiles with other profiles in a database (see, e.g., U.S. Pat. No. 5,777,888, issued Jul. 7, 1998 to Rine et al. entitled “Systems for generating and analyzing stimulus-response output signal matrices”) or clustering of profiles by similarity can give clues to the molecular targets of drugs and related functions, efficacy and toxicity of drug candidates and/or pharmacological agents. Such comparisons may also be used to derive consensus profiles representative of ideal drug activities or disease states. Profile comparison can also help detect diseases in a patient at an early stage and provide improved clinical outcome projections for a patient diagnosed with a disease.
2.2 Fluorophore Bias
The use of two fluorophores has been described by Shalon et al., 1996, “A microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization,” Genome Research 6:629-645. The problem with the approach put forth by Shalon is that each species of mRNA molecule has a bias in its measured color ratio due to interaction of the fluorescent labeling molecule with either the reverse transcription of the mRNA or with the hybridization efficiency or both. Without any error correction scheme to account for this bias, the data from a single microarray experiment, or even a plurality of nominal repeats of a microarray experiment in which the various results are averages, will produce an unacceptable error rate. As used herein, the term nominal repeat or nominally repeated experiment refers to experiments that are run under essentially the same or similar experimental conditions such that it would be useful to combine the results of the repeated experiments.
2.3 Inherent Error Rates of Cellular Constituent Quantitative Measurement Experiments
While the technological advances have allowed for the generation of quantitative measurements of the levels cellular constituents, the experiments are expensive. A single microarray experiment, or a single gel electrophoresis place, can cost in the neighborhood of $100-$ 1000 and higher. Also, it has only become apparent after many initial attempts to apply the data to actual commercial needs that individual experiments suffer from high levels of false positives in the sense of declaring significance where there really is none. Because of the expense involved, and the high rate of false positives, no description of robust methods for repeating and statistically combining multiple, nominally identical experiments for the express purpose of data quality improvement have been provided in the prior art.
The power of genome-wide cell profiling accomplished with microarrays is in its ability to survey response to known perturbations across essentially the entire set of cellular mechanisms. However, in any given experiment, typically only a small number of cellular constituents may have dramatic changes in abundance, where the vast majority are unchanged. There are exceptions, but cells have specific, biologically fairly insulated responses to stimuli, and so most profiles involve a large set of constituents with ‘no-change’, and a much smaller set that are either up or down regulated. For this reason, even a small false alarm rate in the measurements can severely compromise their utility. For example, if one percent of cellular constituents actually respond in a typical experiment, the resolution in the measurement is twofold, and the errors exceed twofold one percent of the time, then there will be as many false alarms as true detections above a twofold threshold.
In general, the art has underappreciated the extensive amount of errors that are present in individual cellular constituent quantification experiments such as microarray or protein gel experiments. In addition to the difficulty posed by the fairly insulated response biological systems have to any given perturbation, a substantial amount of error is present in any nominal microarray experiment due to artifacts such as unevenly printed DNA probe spots on the microarray, scratches dust and artifacts on the microarray, uneveness in signal brightness across the microarray due to nonuniform DNA hybridization, and color stripes due fluorophore-specific biases of fluorophores used in the microarray process.
One method to reduce the effects of these serious errors is to repeat the experiment under identical conditions and to average the data. However, simple averaging of the data without any consideration of the nature of the underlying experimental errors does not provide an adequate solution to the problems the experimental errors introduce. If only simple averaging of the data is performed, an excessive number of nominal repeats would be required in order to reduce the effects of error down to an acceptable level. However, because of the expense involved in performing each cellular constituent quantification experiment, this is not a feasible solution. Accordingly, what is needed in the art are robust methods for combining the experimental results of repeated cellular constituent quantification experiments so that a minimal set of nominal repeats can provide an acceptable error rate.
Discussion of citation of a reference herein shall not be construed as an admission that such citation is prior art to the present invention.