An increasing number of genomes from diverse and important organisms, including humans and pathogenic microbes, have been or are now being sequenced. As the nucleic acid sequence of each new genome sequence is resolved, the need for methods to determine the expression profiles and function of each gene becomes ever more pressing.
Transcriptional machinery is largely conserved between different species ranging from yeast to human, and reflects the fundamental nature of the transcriptional process. In fact, there is good sequence conservation of over 100 proteins known to be involved in the transcriptional process. The protein that synthesizes mRNA is known as RNA polymerase II. RNA polymerase II is a large protein complex composed of multiple subunits. This protein binds to a DNA sequence, which is known as the TATA box because of the linear arrangement of the DNA sequence and its proximity to the start of transcription. However, RNA polymerase II does not bind to the TATA box without the prior association of several other proteins, including transcription factors TFIID, TFIIA and others, with this DNA region. These proteins interact with one another, forming a complex to which RNA polymerase II can bind. This scaffolding of protein interactions at the TATA box forms the transcriptional apparatus. This basic transcriptional complex is very similar for all genes in all cells of an organism, yet it is clear that transcription of selective genes can be on or off, as well as differentially regulated, in distinct cell types to yield different amounts of mRNA. This process of transcriptional regulation is multifaceted and involves the association of several additional proteins in particular arrangements with the transcriptional complex. The arrangement and identity of transcriptional accessory proteins, also called transcription factors, can be unique for individual genes.
Two methods are predominantly used to determine the function of a gene. The sequence approach identifies sequence motifs encoding structural elements, such as nucleic acid-binding domains, that can be used to postulate the function of a gene having these motifs. The drawback to this method is that without prior knowledge of the function of a motif, the sequence approach is not useful. Thus, if a new gene is discovered, but contains no known motifs, the sequence approach fails to provide any clues as to the function of that gene.
The second method explores the pattern of expression of a particular gene. The pattern of expression can illuminate the function of the gene when the expression of that gene is compared to the stimuli that affected the expression. Accumulated expression data can then provide insight as to the function of the gene and the polypeptide it encodes.
A number of methods have been devised for detecting and quantifying gene expression levels, such as northern blots (Alwine et al., 1977, Proc. Nat'l Acad. Sci. USA 74: 5350-5354), differential display (Liang and Pardee, 1992, Science 257: 967-971), S1 nuclease protection (Berk and Sharp, 1977, Cell 12: 721-732), sequencing cDNA libraries (Adams, et al., 1991, Science 252: 1651-1656; Okubo, et al., 1992, Nature Genet. 2: 173-179), serial analysis of gene expression (SAGE) (Velculescu et al., 1995, Science 270: 484-487), cDNA arrays and oligonucleotide arrays (Schena, et al., 1995, Science 270: 467-470; Schena, et al., 1996, Proc. Natl. Acad. Sci. USA 93: 10614-10619; Lockhart, et al., 1996, Nature Biotechnol. 14: 1675-1680). The common theme between these various methods of analyzing gene expression is the highly sensitive and highly specific interaction of complementary nucleic acids. Most gene expression applications employ a single, labeled oligonucleotide and a mixture of cell or tissue derived RNA species. The exquisite selectivity of the nucleic acid hybridization between the labeled probe and the unknown target RNAs provides information regarding the abundance of a particular RNA in each pool of targets. From this, gene expression data can be obtained.
cDNA microarrays represent a significant improvement over these methods because microarrays allow for the specific nucleotide-nucleotide interaction to occur on a massive scale in that many gene specific polynucleotides derived from RNA transcripts are fixed on a support and are then exposed to an even larger number of fluorescent- or radio-labeled cDNAs derived from total RNA pools of a test cell or tissue. The signal generated by hybridization between the fixed probes and the labeled targets allows determination of the relative amount of a transcript present on the microarray and in the cDNA pool, and the result of the effect of a stimulus on a cell or tissue is determined by a comparison between a test cell or tissue and a control cell or tissue.
Methods for analyzing gene expression, and microarrays in particular, have proven to be a powerful tool in the analysis of gene function. The variance in gene expression between two divergent tissues derived from the same primordial cell, the effect of a toxic chemical on a cell or tissue, the difference in gene expression between a healthy tissue and one afflicted by disease, the molecular basis of tumorigenicity, the metabolic shift from anaerobic to aerobic respiration in yeast and the basis of virulence between a non-pathogenic and pathogenic strain of the same species have all been investigated using gene expression analysis and microarrays in particular. However, current methods of gene expression analysis all require lysis of the cell or tissue, isolation of RNA, and in vitro detection of transcription and translation events. Biological phenomena are not accurately represented in the sterile atmosphere of a glass microarray chip, but rather in the intracellular milieu that influences gene expression and protein translation. Further, the gene expression analysis methods currently used require that detection of transcription and translation events take place over a series of time points that may not accurately reflect the actual workings of a biological system, rather than in real-time. Thus, microarrays offer only an approximation of actual gene expression because they require in vitro detection spaced out over an arbitrary timeframe unlikely to correspond to actual biological events.
There exists a long felt need to provide in vivo real-time detection and analysis of gene expression and function. The present invention meets this need.