The invention relates generally to methods for identifying differentially expressed genes, and more particularly, to a method of competitively hybridizing differentially expressed DNAs with reference DNA sequences cloned on solid phase supports to provide a differential expression library which can be physically manipulated, e.g. by fluorescence-activated flow sorting.
The desire to decode the human genome and to understand the genetic basis of disease and a host of other physiological states associated differential gene expression has been a key driving force in the development of improved methods for analyzing and sequencing DNA, Adams et al., Editors, Automated DNA Sequencing and Analysis (Academic Press, New York, 1994). The human genome is estimated to contain about 105 genes, 15-30% of whichxe2x80x94or about 20-40 megabasesxe2x80x94are active in any given tissue. Such large numbers of expressed genes make it difficult to track changes in expression patterns by available techniques, especially in view of the large number of genes that are expressed at relative low levels: It has been estimated that as much as 30% of mRNA consists of many thousands of distinct species each making up less than 0.5% of the total, and typically averaging less than 14 copies per cell, Sambrook et al., Molecular Cloning, Second Edition (Cold Spring Harbor Laboratory Press, New York, 1989). Even substantial changes in expression among such low abundance mRNAs can be difficult to detect in the presence overwhelming quantities of abundant sequences.
A variety of techniques are available for analyzing gene expression that differ widely in convenience, expense, and sensitivity. Commonly used low resolution techniques include differential display, indexing, subtraction hybridization, and numerous DNA fingerprinting techniques, e.g. Vos et al., Nucleic Acids Research, 23: 4407-4414 (1995); Hubank et al., Nucleic Acids Research, 22: 5640-5648 (1994); Lingo et al., Science, 257: 967-971 (1992); Erlander et al., International patent application PCT/US94/13041; McClelland et al., U.S. Pat. No. 5,437,975; Unrau et al., Gene, 145: 163-169 (1994); and the like. Higher resolution techniques include analysis of expressed sequence tags (ESTs), e.g. Adams et al. (cited above); analysis of concatenated fragments of expressed sequences (SAGE), e.g. Velculescu et al., Science, 270: 484-486 (1995); Zhang et al., Science, 276: 1268-1272 (1997); Velculescu et al., Cell, 88: 243-251 (1997); and the use of microarrays of oligonucleotides or polynucleotides for capturing complementary polynucleotides from expressed genes, e.g. Schena et al., Science, 270: 467-469 (1995); DeRisi et al., Science, 278: 680-686 (1997); Chee et al., Science, 274: 610-614 (1996); and the like.
The latter two high resolution techniques have shown promise as potentially robust systems for analyzing gene expression; however, there are still technical issues that need to be addressed with both approaches. In microarray systems, genes to be monitored must be known and isolated beforehand, which means different microarrays, or xe2x80x9cDNA chips,xe2x80x9d have to be manufactured for each specialized use and for every different type of organism or species examined. With respect to microarrays constructed from fluid-delivered cDNAs, a significant degree of variability, e.g. 2-5 fold, exists in the signals generated under the same hybridization conditions, Atlas(trademark) cDNA Expression System Users Manual (Clontech Laboratories, Palo Alto, 1998), and the systems are not readily re-usable. With respect to microarrays of synthetic oligonucleotides, a significant set-up cost for manufacturing such arrays and expensive chip-reading instruments put such systems beyond the financial capability of many potential users. In sequence tag systems, although no special instrumentation is necessary, as an extensive installed base of DNA sequencers may be used, even routine expression analysis requires a significant sequencing effort, e.g. several thousand sequencing reactions or more; the selection of type IIs tag-generating enzymes is limited; and the length (nine nucleotides) of the sequence tag in current protocols severely limits the number of cDNAs that can be uniquely labeled. It can be shown that for organisms expressing large sets of genes, such as mammalian cells, the likelihood of nine-nucleotide tags being distinct for all expressed genes is extremely low, e.g. Feller, An Introduction to Probability Theory and Its Applications, Second Edition, Vol. I (John Wiley and Sons, New York, 1971).
It is clear from the above that there is a need for a convenient and sensitive technique for analyzing gene expression that permits the analysis of either known or unknown genes from any source. The availability of such a technique would find immediate application not only in medical and scientific research, but also in a host of applied fields, such as crop and livestock development, pest management, drug development, diagnostics, disease management, and the like.
Accordingly, objects of our invention include, but are not limited to, providing a method for identifying and isolating differentially expressed genes; providing a method of identifying and isolating polynucleotides on the basis of labels that generate different optical signals; providing a method for profiling gene expression of large numbers of genes simultaneously; providing a method of identifying and separating genes in accordance with whether their expression is increased or decrease under any given conditions; providing a method for identifying rare genes; and providing a method for massively parallel signature sequencing of large numbers of genes isolated according to their expression.
Our invention accomplishes these and other objects by providing differently labeled populations of polynucleotides from cell or tissue sources whose gene expression is to be compared. In comparing gene expression, differently labeled polynucleotides of a plurality of populations are competitively hybridized with reference DNA cloned on solid phase supports. Preferably, the solid phase supports are microparticles which, after such competitive hybridization, provide a differential expression library which may be manipulated by fluorescence-activated cell sorting (FACS), or other sorting means responsive to optical signals generated by labeled polynucleotides on the microparticles. Monitoring the relative signal intensity of the different labels on the microparticles permits quantification of the relative expression of particular genes in the different populations.
In one aspect of the invention, populations of microparticles having relative signal intensities of interest are isolated by FACS and the attached polynucleotides are sequenced to determine the identities of the rare or differentially expressed genes.
Preferably, the method of the invention is carried out by the following steps: a) providing a reference population of nucleic acid sequences attached to separate solid phase supports in clonal subpopulations; b) providing a population of polynucleotides of expressed genes from each of the plurality of different cells or tissues, the polynucleotides of expressed genes from different cells or tissues having a different light-generating label; c) competitively hybridizing the populations of polynucleotides of expressed genes from each of the plurality of different cells or tissues with the reference population to form duplexes between the sequences of the reference population and polynucleotides of each of the different cells or tissues such that the polynucleotides are present in duplexes on each of the solid phase supports in ratios directly related to the relative expression of their corresponding genes in the different cells or tissues; and d) detecting a relative optical signal generated by the light-generating labels of the duplexes attached thereto. In further preference, the method includes the step of sorting each solid phase support according to the relative optical signal detected. Preferably, the reference population of nucleic acids is derived from genes of the plurality of different cells or tissues being analyzed. As used herein, the phrase xe2x80x9cpolynucleotides of expressed genesxe2x80x9d is meant to include any RNA produced by transcription, including in particular mRNA, and DNA produced by reverse transcription of any RNA, including in particular cDNA produced by reverse transcription of mRNA.
The present invention overcomes shortcoming in the art by providing compositions, methods, and kits for separating and identifying genes that are differentially expressed without requiring any previous analysis or knowledge of the sequences. The invention also permits differentially regulated genes to be separated from unregulated genes for analysis, thereby eliminating the need to analyze large numbers of unregulated genes in order to obtain information on the genes of interest.