Most prokaryotic genes are encoded by continuous DNA sequences that are not interrupted by introns. In contrast, most genes in higher eukaryotes are interrupted, i.e., protein coding sequences, the exons, are separated by noncoding, often much longer, sequences, the introns. For example, a typical mammalian gene has a size of about 16 kb with about 7-8 exons, whereas a typical mRNA has a size of only about 2.2 kb (Lewin, Genes V, Oxford University Press, Oxford, 1994). Protein production from such an interrupted gene involves the transcription of the entire length of such gene, including all exons and introns, into a primary transcript or pre-mRNA and the subsequent removal of the intron sequences by RNA splicing to produce a mature mRNA that encodes the protein.
In addition to alternatively spliced mRNAs derived from different combinations of exons, a number of other mechanisms may also lead to varied mRNA structures. For example, different 5′ termini may be present because of multiple promoter elements. Similarly, alternative 3′ processing may result in variable sites of polyA addition. Methods that can be used for detecting alternative exon splicing can also be applied for detecting these alternative mRNAs except that sequence rules for these mechanisms rather than sequence rules for splice junction sequences are used. Certain RNA editing processes could depress the hybridization signal from a genomic region. In addition, RNA trans-splicing events may join sequences encoded from unlinked genomic regions into an RNA or duplicate genomic sequences to produce enhanced signals (see e.g., Caudevilla et al., 1998, Proc. Natl. Acad. Sci. U.S.A. 95:12185-12190). Therefore, in eukaryotes the sequences of mRNAs do not correspond directly to genomic sequences of the genes.
The interrupted gene structure in eukaryotes offers an important mechanism for generating multiple proteins from a single gene. For example, a pre-mRNA can be spliced in different ways in a process called alternative splicing thereby allowing production of different protein isoforms with different functions from a single gene. Alternative splicing thus permits fine modulation of gene expression so that proteins can be expressed in the proper spatiotemporal context (Reyes, et al., 1991, Molecular and Cellular Biology 11:1654-1661). It is estimated that more than 35% of human mRNAs contain possible alternative splice forms (Mironov et al., 1999, Genome Research 9:1288-1293; Brett et al., 2000, FEBS Lett. 474:83-86). Alternative splicing has also been implicated in various diseases, including various cancers. For example, alternative splicing of the pre-mRNA encoding CD44 has been suggested as being important in a number of human cancers (Stickeler, et al., 1999, Oncogen 18:3574-3582).
Nuclear RNA splicing reaction, i.e., the excision of introns and ligation of exons, requires a complex nuclear machinery, the spliceosome, which is formed by a large number of splicing factors, including various proteins and ribonucleoproteins. Any variation in the relative levels of such splicing factors may affect gene expression through alternative splicing pathways. For example, it is found that overexpression of antagonistic splicing factors SF2/ASF affects alternative splicing in vivo (Caceres, et al., 1994, Science 265:1706-1709).
It is therefore of both fundamental and practical importance to monitor the expression profiles of exons, i.e., the expression levels of a plurality of exons in a plurality of genes in the genome of an organism, in cell samples, preferably on a genomic scale. On the fundamental side, this would offer an important means to link genomic sequence to protein production, and therefore phenotype. On the practical side, such exon expression profiles may be used to determined the transcriptional state of a cell or cell type. An exon expression profile and its correlation with the expression pattern of different mRNA transcripts may also be used to determine the response of a cell or cell type to external perturbations on the exon level. Therefore, there exists a need for methods for simultaneously monitoring the expression of exons of genes in a cell or a cell type. There also exists a need for methods for monitoring on the exon level the response of a cell or cell type to external perturbations.
Current methods for analysis of the expression of exons in a gene are tedious and labor-intensive. These methods, such as methods using Northern blotting and DNA sequencing, can only be applied to one single gene at a time. They are therefore not suitable for analysis of the expression of exons in a plurality of genes in a cell sample.
DNA array technologies have made it possible to monitor the expression level of a large number of genetic transcripts at any one time (see, e.g., Schena et al., 1995, Science 270:467-470; Lockhart et al., 1996, Nature Biotechnology 14:1675-1680; Blanchard et al., 1996, Nature Biotechnology 14:1649; Ashby et al., U.S. Pat. No. 5,569,588, issued Oct. 29, 1996). Of the two main formats of DNA arrays, spotted cDNA arrays are prepared by depositing PCR products of cDNA fragments with sizes ranging from about 0.6 to 2.4 kb, from full length cDNAs, ESTs, etc., onto a suitable surface (see, e.g., DeRisi et al., 1996, Nature Genetics 14:457-460; Shalon et al., 1996, Genome Res. 6:689-645; Schena et al., 1995, Proc. Natl. Acad. Sci. U.S.A. 93:10539-11286; and Duggan et al., Nature Genetics Supplement 21:10-14). Alternatively, high-density oligonucleotide arrays containing thousands of oligonucleotides complementary to defined sequences, at defined locations on a surface are synthesized in situ on the surface by, for example, photolithographic techniques (see, e.g., Fodor et al., 1991, Science 251:767-773; Pease et al., 1994, Proc. Natl. Acad. Sci. U.S.A. 91:5022-5026; Lockhart et al., 1996, Nature Biotechnology 14:1675; McGall et al., 1996, Proc. Natl. Acad. Sci. U.S.A. 93:13555-13560; U.S. Pat. Nos. 5,578,832; 5,556,752; 5,510,270; and 6,040,138). Methods for generating arrays using inkjet technology for in situ oligonucleotide synthesis are also known in the art (see, e.g., Blanchard, International Patent Publication WO 98/41531, published Sep. 24, 1998; Blanchard et al., 1996, Biosensors and Bioelectronics 11:687-690; Blanchard, 1998, in Synthetic DNA Arrays in Genetic Engineering, Vol. 20, J. K. Setlow, Ed., Plenum Press, New York at pages 111-123). Efforts to further increase the information capacity of DNA arrays range from further reducing feature size on DNA arrays so as to further increase the number of probes in a given surface area to sensitivity- and specificity-based probe design and selection aimed at reducing the number of redundant probes needed for the detection of each target nucleic acid thereby increasing the number of target nucleic acids monitored without increasing probe density (see, e.g., Friend et al., U.S. patent application Ser. No. 09/364,751, filed on Jul. 30, 1999, now abandoned; and Friend et al., U.S. patent application Ser. No. 09/561,487, filed on Apr. 28, 2000, now U.S. Pat. No. 7,013,221 B1).
By simultaneously monitoring tens of thousands of genes, DNA array technologies have allowed, inter alia, genome-wide analysis of mRNA expression in a cell or a cell type or any biological sample. Aided by sophisticated data management and analysis methodologies, the transcriptional state of a cell or cell type as well as changes of the transcriptional state in response to external perturbations, including but not limited to drug perturbations, can be characterized on the mRNA level (see, e.g., Stoughton et al., International Publication No. WO 00/39336, published Jul. 6, 2000; Friend et al., International Publication No. WO 00/24936, published May 4, 2000). Applications of such technologies include, for example, identification of genes which are up regulated or down regulated in various physiological states, particularly diseased states. Additional exemplary uses for DNA arrays include the analyses of members of signaling pathways, and the identification of targets for various drugs. See, e.g., Friend and Hartwell, International Publication No. WO 98/38329 (published Sep. 3, 1998); Stoughton, International Publication No. WO 99/66067 (published Dec. 23, 1999); Stoughton and Friend, International Publication No. WO 99/58708 (published Nov. 18, 1999); Friend and Stoughton, International Publication No. WO 99/59037 (published Nov. 18, 1999); Friend et al., U.S. patent application Ser. No. 09/334,328 (filed on Jun. 16, 1999, now U.S. Pat. No. 6,218,122 B1).
However, current DNA array technologies typically monitor the 3′ ends of mRNA molecules in a cell, rather than the expression levels of individual exons that make up the mRNAs. For example, probes used in cDNA arrays typically range in sizes from about 0.6 to 2.4 kb (Duggan et al., Nature Genetics Supplement 21:10-14), and are generally complementary to the 3′ ends of the mRNA molecules. Probes used in cDNA arrays are biased to the 3′ end because labeling methods typically rely on d(T) primed reverse transcription. Expression analysis using high density oligonucleotide arrays has been described that requires scoring and averaging of as many as 20 oligonucleotide probes on an array, chosen from various locations of the coding sequence of a gene, to determine the transcript level of the corresponding mRNA (see, e.g., Lockhart et al., 1996, Nature Biotechnology 14:1675; U.S. Pat. Nos. 5,578,832; 5,556,752; 5,510,270; and 6,040,138; Lipshutz et al., 1999, Nature Genetics Supplement 21:20-24). Again, these probes are placed near the 3′ ends of mRNA molecules and the probe intensities are averaged to a single value, and thus does not provide information of the expression of individual exons across the genes. In addition, it has been found that the majority of splicing events occurs in 5′ untranslated regions, which leads to the generation of additional protein domains rather than alternating domains (Mironov et al., 1999, Genome Research 9:1288-1293). It has also been found that alternative exon-intron structures, i.e., with different end points, exist in many exons, which leads to expressed exons of different lengths (Mironov et al., 1999, Genome Research 9:1288-1293). Thus, there exists a need to design DNA arrays that measure the expression levels and the lengths of a plurality of exons for each of a plurality of genes in the genome of an organism. There exists a need for methods for quantitatively monitoring alternative splicing on a genome-wide scale.
Discussion or citation of a reference herein shall not be construed as an admission that such reference is prior art to the present invention.