The desire to decode the human genome and to understand the genetic basis of disease and a host of other physiological states associated differential gene expression has been a key driving force in the development of improved methods for analyzing and sequencing DNA, Adams et al, Editors, Automated DNA Sequencing and Analysis (Academic Press, New York, 1994). Current genome sequencing projects use Sanger-based sequencing technologies, which enable the sequencing and assembly of a genomes in the size range of 2-4 megabases with about 24 man-months of effort, e.g. Fleischmann et al, Science, 269: 496-512 (1995). Such a genome is about 0.005 the size of the human genome, which is estimated to contain about 10.sup.5 genes, 15% of which--or about 3 megabases-are active in any given tissue. The large numbers of expressed genes make it difficult to track changes in expression patterns by direct sequence analysis. More commonly, expression patterns are analyzed by lower resolution techniques, such as differential display, indexing, subtraction hybridization, or one of the numerous DNA fingerprinting techniques, e.g. Lingo et al, Science, 257: 967-971 (1992); Erlander et al, International patent application PCT/US94/13041; McClelland et al, U.S. Pat. No. 5,437,975; Unrau et al, Gene, 145: 163-169 (1994); Sagerstrom et al, Annu. Rev. Biochem. 66: 751-783 (1997); and the like. For the techniques that result in the isolation of a subset of DNA sequences, sequencing of randomly selected clones is typically carried out using conventional Sanger sequencing; thus, the scale of the analysis is limited.
Recently, several higher resolution techniques have been reported that attempt to provide direct sequence information for analyzing patterns of gene expression on a large scale: Schena et al, Science, 270: 467-469 (1995), and DeRisi et al, Science, 278: 680-686 (1997), report the hybridization of mRNAs to a collection of cDNAs arrayed on a glass slide; Velculescu et al, Science, 270: 484-486 (1995) report the excision and concatenation of short segments of sequence adjacent to type iHs restriction sites from members of a cDNA library, followed by Sanger sequencing of the concatenated segments to give a profile of sequences in the library; and Wodicka et al, Nature Biotechnology, 15: 1359-1367 (1997), report genome-wide expression monitoring of yeast under different growth conditions using high density oligonucleotide arrays containing hybridization sites for each of the more than 6000 genes of the organism. While these techniques represent tremendous progress in expression analysis, they still have drawbacks which limit their widespread application to many expression monitoring problems. For example, in both the techniques of Schena and Wodicka, the sequences being monitored must be known beforehand, and in the case of Wodicka preferably the entire complement of an organism's genes must be known. In the technique of Schena, there are significant problems in constructing arrays containing a substantial portion, e.g. ten thousand, or more, of genes whose expression may be relevant, as cDNAs of each gene are separately prepared an applied to an array and currently available arrays are typically not re-usable leading to standardization and quality control issues when multiple measurements over time are desired. In the technique Velculescu, even though the sequencing burden is reduced, as with any random sequencing approach, abundant non-differentially expressed genes are sequenced repeatedly--at the expense of obtaining expression information on differentially regulated genes, and it is not clear from the reported data whether the technique is capable of providing sample sizes sufficiently large to permit the reliable expression profiling of genes that are expressed very low levels, e.g. Kollner et al, Genomics, 23: 185-191 (1994).
In view of the above, it would be highly desirable if a technique were available for monitoring differential gene expression that had the capability of massively parallel analysis of all or a substantial fraction of expressed genes, but was free of the shortcomings of current techniques.