There are numerous biotechnology applications in which the researcher is interested the changes in gene expression of a moderate set of genes, for many hundreds or thousands of biological samples. Over the last decade, gene expression analysis has proven to be an extremely valuable tool for monitoring the state of cells, and specific pathway responses to different stimulations and environments. This ability to both broadly survey cellular activities and to track differential and dynamic responses means that expression tools have been able to provide significant insight into cancer and other disease genetics. The current state of the art in gene expression is represented by two very different technologies, microarray analysis and real-time rtPCR. Each technology offers major targeted benefits, with microarrays enabling large-scale surveys of thousands of genes for small sets of samples, and real-time rtPCR providing high sensitivity, high accuracy measurements of small sets of genes for hundreds to thousands of samples. There is, however, a technological gap that is not fully served by either of these technologies.
Multiple experimental applications exist where there is an interest and a need to screen moderate sets of genes, e.g. 20 to 100 genes for hundreds to thousands of samples. For example, to fully capture the activities of functional pathways such as apoptosis or angiogenesis, it is necessary to track between 50 and 100 genes. In fact, linear and nonlinear statistical techniques have been successfully applied to the analysis of microarray data and it is clear that correlation and cluster analysis generally collapses the responses of thousands of genes to a much smaller set of representative genes and response types. For example, Thomas et.al. (2001) Molecular Pharmacology60: 1189-1194, have used this approach to identify 12 key transcripts out of 1200 that can predictively track 5 major toxicological responses. Van't Veer et.al. (2002) Nature415: 530-536, recently demonstrated that a set of 70 genes, out of 25,000 tested, could provide a prognostic signature for metasteses in breast cancer patients, and that the expression profile outperformed other clinical parameters used to predict disease outcome.
Another major area of interest for a high throughput gene expression assay is compound library screening. The pharmaceutical drug discovery process has traditionally been dominated by biochemical and enzymatic studies of a designated pathway. Although this approach has been productive, it is very laborious and time-consuming, and is generally targeted to a single gene or defined pathway. Today, the predominant screening assay formats fall into two categories: gene specific and phenotypic. Gene-specific screens, such as protein binding assays and reporter gene assays, focus on capturing the effects of a given compound on a single gene or protein endpoint, while phenotypic screens typically capture gross cellular changes, such as apoptosis, cell proliferation, or ion flux. Both of these screening approaches have significant value, but they are not optimal for screening compounds with respect to their effects on a multiplicity of genes involved in a complex disease, such as cancer. Gene-specific screens are too focused and cannot observe multigenic responses to perturbations. Cell-based phenotypic screens are too broad and cannot be used to differentiate the multiple pathways that can be altered to produce a phenotypic response, nor can they effectively be used to optimize and direct compound development toward specific mechanisms of action. Molecular biology and the development of gene cloning have dramatically expanded the number of genes that are potential drug targets, and this process is accelerating rapidly as a result of the progress made, e.g., in sequencing the human genome. In addition to the growing set of available genes, techniques such as the synthesis of combinatorial chemical libraries have created daunting numbers of candidate drugs for screening. In order to capitalize on these available materials, methods are needed that are capable of extremely fast and inexpensive analysis of gene expression levels. The utilization of a screen that can look at a multiplicity of genes in parallel, e.g. 5-100, can be used to overcome the deficits of these other screening approaches.
Automated high-throughput, rtPCR is one efficient approach to gene expression analysis. This approach involves isolating RNA from cells, performing multiplexed rtPCR and then running out the samples on a capillary electrophoresis unit. For example, in the context of screening a compound or chemical library of 10,000 compounds in a cell-based assay, in which the relative expression levels for 20 genes are measured, the established process involves several steps including culturing the experimental cells, typically in microtiter-plate format, isolation of the RNA from these cells, selective amplification using rtPCR, in targeted sets of 10 to 20 genes per amplification reaction, and analysis of the amplification products using capillary electrophoresis.
This process is robust and incorporates an amplification scheme that couples the use of gene-specific and universal primers to lock in the relative gene ratios for all of the genes being amplified. The method also takes advantage of the newest generation of automated, high-resolution capillary electrophoresis instruments. However, these instruments are capable of analyzing only a moderate set of samples in a given run.
Nucleic acid microarrays are available, having the benefit of assaying for sample hybridization to a large number of probes in a highly parallel fashion. They can be used for quantitation of mRNA expression levels, and dramatically surpass the above mentioned techniques in terms of multiplexing capability. These arrays comprise short DNA probes, such as PCR products, oligonucleotides, or cDNA products fixed onto a solid surface, which can then be used in a hybridization reaction with a target sample, generally a whole cell extract (see, for example, U.S. Pat. Nos. 5,143,854 and 5,807,522; Fodor et al. (1991) Science 251:767-773; and Schena et al. (1995) Science 270:467-470), cellular RNA sample, or cDNA sample corresponding to cellular RNAs. Microarrays can be used to measure the expression levels of several thousands of genes simultaneously, generating a gene expression profile of the entire genome of relatively simple organisms. Each reaction, however, is performed with a single biological sample against a very large number of gene probes. As a consequence, microarray technology does not facilitate high throughput analysis of very large numbers of unique samples against an array of known probes. While both microarrays and real-time rtPCR techniques can be pressed into service in these important experimental areas, the fact of the matter is that neither method can do this work cost efficiently and with limited amounts of sample. As demand for gene expression data increases, it is desirable to further reduce costs per expression data point while increasing throughput. However, the scientific focus for the process should remain the same, namely, the accurate analysis of moderate sets of genes (tens to hundreds) for many thousands of samples.
Described herein are strategies for screening compound libraries involving carrying the rtPCR approach to a new level of throughput while reducing cost per data point. The approach involves replacing capillary electrophoresis readouts with microarray-format readouts. The advantages of the method are multiple and include (1) the ability to run thousands of samples in high throughput, e.g. in hours of time versus weeks, (2) the possibility to work with very small amounts of RNA, e.g. sub-nanogram amounts, opening the door to multiplexed gene expression analysis of very small amounts of tissue (such as can obtained using laser capture microdissection), and (3) the potential to run at a very low cost per data point, e.g. 1 or a few pennies per gene. This conversion of readout format can be directly integrated into the current rtPCR process enabling a smooth transition to this higher throughput platform. This change in methodology also modifies the existing platform for further advances based on the parallelization of sample processing in the microarray format, modifications that can lead to increased economies in reagent usage, time and labor, while maintaining a focus on measuring the gene expression response for moderate sets of genes across numerous biological samples.