The desire to decode the human genome and to understand the genetic basis of disease and a host of other physiological states associated with differential gene expression has been a key driving force in the development of improved methods for analyzing and sequencing DNA. However, the large number of expressed genes in the human genome makes it difficult to track changes in expression patterns by direct sequence analysis. More commonly, expression patterns are analyzed by lower resolution techniques, such as differential display, indexing, subtraction hybridization, or one of the numerous DNA fingerprinting techniques (e.g. Lingo et al,. Science 257: 967-971, 1992); McClelland et al., U.S. Pat. No. 5,437,975; Unrau et al., Gene 145: 163-169, 1994); Sagerstrom et al., Ann. Rev. Biochem. 66: 751-783, 1997). For techniques that result in the isolation of a subset of DNA sequences, sequencing of randomly selected clones is typically carried out using conventional Sanger sequencing; thus, the scale of the analysis is limited.
Recently, several higher resolution techniques have been reported that attempt to provide direct sequence information for analyzing patterns of gene expression on a large scale: Schena et al., Science 270: 467-469 (1995), and DeRisi et al, Science 278: 680-686 (1997), report the hybridization of mRNAs to a collection of cDNAs arrayed on a glass slide; Velculescu et al., Science 270: 484-486 (1995), report the excision and concatenation of short segments of sequence adjacent to type IIs restriction sites from members of a cDNA library, followed by Sanger sequencing of the concatenated segments to give a profile of sequences in the library; and Wodicka et al., Nature Biotechnology 15: 1359-1367 (1997), report genome-wide expression monitoring of yeast under different growth conditions using high density oligonucleotide arrays containing hybridization sites for each of the more than 6000 genes of the organism. While these techniques represent tremendous progress in expression analysis, they still have drawbacks which limit their widespread application to many expression monitoring problems. For example, in both the techniques of Schena and Wodicka, the sequences being monitored must be known beforehand, and in the case of Wodicka, preferably the entire complement of an organism's genes must be known. In the technique of Schena, there are significant problems in constructing arrays containing a substantial portion, e.g. ten thousand, or more, of genes whose expression may be relevant, as cDNAs of each gene are separately prepared and applied to an array, and currently available arrays are typically not re-usable, leading to standardization and quality control issues when multiple measurements over time are desired. In the technique of Velculescu, even though the sequencing burden is reduced, abundant non-differentially expressed genes are sequenced repeatedly, as with any random sequencing approach, at the expense of obtaining expression information on differentially regulated genes. In addition, it is not clear from the reported data whether the technique is capable of providing sample sizes sufficiently large to permit the reliable expression profiling of genes that are expressed very low levels (e.g. Kollner et al., Genomics, 23: 185-191, 1994).
Co-owned U.S. Pat. No. 6,265,163 provides a method of massive parallel analysis of all or a substantial fraction of expressed genes, allowing selection of differentially expressed genes from non-differentially expressed genes, without requiring prior knowledge of the differentially expressed sequences being monitored. More generally, the method allows detection and isolation of differentially represented nucleic acids from any two nucleic acid populations.
In accordance with this method, also described in Brenner et al., PNAS 97:1665-70 (2000), differently labeled populations of DNAs from sources to be compared are competitively hybridized with reference DNA cloned on solid phase supports, e.g. microparticles, to provide a differential expression library which, in the preferred embodiment, is manipulated by fluorescence-activated cell sorting (FACS). Monitoring the relative signal intensity of the different fluorescent labels on the microparticles permitted quantitative analysis of relative expression levels between the different sources. An illustration of the process is given in Example 4 herein. Populations of microparticles having relative signal intensities of interest were isolated by FACS, and the attached DNAs identified by sequencing, such as with massively parallel signature sequencing (MPSS), or with conventional DNA sequencing protocols. Such methods also can be used for identifying differentially represented variations in genomic DNA, e.g. SNP's, deletions, or duplications.
In FACS sorting as applied to these methods, the original ratio of probes in the compared sources is reflected by the ratio of probes hybridized to the target DNA beads and, hence, the ratio of the two fluorescence signals of the beads. Beads with different ratios of fluorescence signals are detected and are sorted from each other according to preset gate(s). See, for example, FIGS. 1A-1B.
For a model system of two equally sized populations of beads, a bead population having hybridized probes at a molar ratio as low as 3:1 could be sorted from a bead population with a 1:1 molar ratio of two probes, using the methods described in U.S. Pat. No. 6,265,163. However, limitations in FACS sorting prevented accurate sorting of beads having lower ratios of the two probes from the much greater population of beads having DNAs equally represented in the two populations. See, for example, FIGS. 5A-E in U.S. Pat. No. 6,265,163, reproduced as FIGS. 11A-e herein. Methods of distinguishing and sorting beads having probes at these lower ratios were desired. Accordingly, the present invention provides methods of improving the resolution of such sorting.