Determination of the genomic sequence of higher organisms, including humans, is now a real and attainable goal. However, this analysis only represents one level of genetic complexity. The ordered and timely expression of genes represents another level of complexity equally important to the definition and biology of the organism.
The role of sequencing complementary DNA (cDNA), reverse transcribed from mRNA, as part of the human genome project has been debated as proponents of genomic sequencing have argued the difficulty of finding every mRNA expressed in all tissues, cell types, and developmental stages and have pointed out that much valuable information from intronic and intergenic regions, including control and regulatory sequences, will be missed by cDNA sequencing (Report of the Committee on Mapping and Sequencing the Human Genome, National Academy Press, Washington, D.C., 1988). Sequencing of transcribed regions of the genome using cDNA libraries has heretofore been considered unsatisfactory. Libraries of cDNA are believed to be dominated by repetitive elements, mitochondrial genes, ribosomal RNA genes, and other nuclear genes comprising common or housekeeping sequences. It is believed that cDNA libraries do not provide all sequences corresponding to structural and regulatory polypeptides or peptides (Putney, et al., Nature, 302:718, 1983).
Another drawback of standard cDNA cloning is that some mRNAs are abundant while others are rare. The cellular quantities of mRNA from various genes can vary by several orders of magnitude.
Techniques based on cDNA subtraction or differential display can be quite useful for comparing gene expression differences between two cell types (Hedrick, et al., Nature, 308:149, 1984; Liang and Pardee, Science, 257:967, 1992), but provide only a partial analysis, with no direct information regarding abundance of messenger RNA. The expressed sequence tag (EST) approach has been shown to be a valuable tool for gene discovery (Adams, et al., Science 252:1656, 1991; Adams, et al., Nature, 355:632, 1992; Okubo et al., Nature Genetics, 2:173, 1992), but like Northern blotting, RNase protection, and reverse transcriptase-polymerase chain reaction (RT-PCR) analysis (Alwine, et al., Proc. Natl. Acad Sci, U.S.A., 74:5350, 1977; Zinn et al, Cell, 34:865, 1983; Veres, et al., Science, 237:415, 1987), only evaluates a limited number of genes at a time. In addition, the EST approach preferably employs nucleotide sequences of 150 base pairs or longer for similarity searches and mapping.
Sequence tagged sites (STSs) (Olson, et al., Science, 245:1434, 1989) have also been utilized to identify genomic markers for the physical mapping of the genome. These short sequences from physically mapped clones represent uniquely identified map positions in the genome. In contrast, the identification of expressed genes relies on expressed sequence tags which are markers for those genes actually transcribed and expressed in vivo.
There is a need for an improved method which allows rapid, detailed analysis of thousands of expressed genes for the investigation of a variety of biological applications, particularly for establishing the overall pattern of gene expression in different cell types or in the same cell type under different physiologic or pathologic conditions. Identification of different patterns of expression has several utilities, including the identification of appropriate therapeutic targets, candidate genes for gene therapy (e.g., gene replacement), tissue typing, forensic identification, mapping locations of disease-associated genes, and for the identification of diagnostic and prognostic indicator genes.