1. Field of Invention
The present invention relates generally to the field of detecting genes and gene expression from biological and medical samples and more particularly it relates to improving both sensitivity and quantification in comparative multi-analyte detection formats such as cDNA chips and expression microarrays.
2. Description of Related Art
Genetic analysis of an organism or tissue involves two major fields of study, the determination of existing genes and mutations as reflected in genomic DNA sequences and the evaluation of functional gene activity as reflected in the expression of messenger RNA (mRNA) transcripts or resulting protein byproducts. Since there are no reasonable means to separately detect all or most protein products simultaneously, global comparisons of gene expression have generally focused on mRNA analysis because such transcripts can be isolated and detected more simply—either by virtue of their specific sequences and or by virtue of the common presence of a poly-A tail on their 3′ end. These poly-A tails allow the entire pool of mRNAs to be simultaneously copied with a single poly-T primer and the enzyme reverse transcriptase (RT) to make a single antisense strand of cDNA from each mRNA transcript in a sample. Consequently, most methods for gene expression analysis have primarily been based on assessing the relative number of RNA transcripts being produced by different genes and on comparing the timing of such gene activity. The most important goal of these methods is therefore to determine the comparative frequency of each transcript in different cells and tissues, as well as detecting any expression changes that occur in response to various stimuli, physiological conditions and pathologic states. Furthermore, such quantitative methods should have broad utility for genetics research in general and for a variety of biomedical applications including tissue typing and forensic analysis, the diagnosis and prognosis of various pathologies, conditions, and responses to therapy, and the identification of new or refined targets for pharmaceutical therapy or gene therapy.
Current art has provided few methods to globally explore gene expression differences between cells and tissues and most studies have employed differential display or cDNA subtraction analysis which provide partial non-quantitative information [Hedrick et al., Nature 308: 149 (1984); Liang et. al., Science 257: 967, (1992)]. Similarly, expression analysis by Northern blotting, RNase protection assays, or reverse transcriptase polymerase chain reaction (RT-PCR) are generally only useful for evaluating a very limited number of genes per analysis [Alwine, et al., Proc. Natl. Acad. Sci., 74: 5350, (1977); Zinn, et al., Cell, 34: 865 (1983); Veres, et al., Science, 237: 415 (1987)]. Several methods have been devised to extract cDNA copies of the 3′ ends of mRNA transcripts and then characterize those fragments by restriction digests [Ivanova et. al., Nucleic Acids Res. 23: 2954 (1995); Prashar et. al., Proc. Natl. Acad. Sci., 93: 659 (1996); Kato, Nucleic Acids Res. 23: 3685 (1995); Kato, U.S. Pat. No. 5,707,807 (1998); Weissman et al., U.S. Pat. No. 5,712,126 (1998)]. While these methods expand the number of expression products that can be studied, they also remain limited in scope. Taking a different approach, Kinzler, et al. [U.S. Pat. No. 5,695,937 (1998)] have devised a more comprehensive method for measuring messenger RNA (mRNA) transcripts quantitatively by extracting and slicing out a tiny segment of the cDNAs copied from the 3′ end of each mRNA transcript and then creating composite concatemers of those segments from different transcripts. The representative 9 or 10 base segments are then counted by sequencing analysis to determine the frequency of the original transcripts. However, this method involves considerable complexity and the sequencing steps are very time consuming and expensive.
The development of cDNA based gene expression microarrays provides a ready means to simultaneously assess the relative expression of hundreds or thousands of different genes from tissue or cellular samples. [Schena et al., Science, 270: 467-470 (1995); Schena, et al., Proc. Natl. Acad. Sci., 93:10614-9 (1996); Shalon et al., Genome Res., 6: 639-45 (1996); DeRisi et al., Nature Genetics, 14: 457-60, (1996); Heller et al., Proc. Natl. Acad. Sci., 94: 2150-5, (1997); Khan et al., Cancer Res., 58: 5009-13 (1998); Khan et al., Electrophoresis, 20: 223-9 (1999)] These analyses are accomplished by first preparing miniature grids or arrays on membranes or coated glass substrates wherein small but dense cDNA samples of individual genes are robotically spotted in a two dimensional pattern. Then, a total RNA or mRNA sample is copied and labeled using reverse transcriptase and a poly-T primer to create a pool of cDNA probes that reflect the mRNA expression transcripts. These labeled probes are then hybridized to their respective gene spots in the microarray in order to detect and determine the relative frequency of each transcript in the original sample. These gene expression arrays, which are commonly called expression microarrays, DNA chips, cDNA chips, or biochips, were first manufactured from gene specific synthetic oligonucleotides that likewise are created or distributed on the array in a two dimensional pattern and that can capture and detect labeled expression products in a somewhat similar manner if they are fragmented into smaller pieces [Fodor et al., U.S. Pat. No. 5,445,934 (1995); Fodor et al., U.S. Pat. No. 5,800,992 (1998)]. These commercial oligo-based DNA chips are called GENECHIPS. It should be noted that microarrays generally refer to miniature arrays on coated glass substrates, however, larger scale arrays on membrane formats employ similar chemistries and target configurations and thus are suitable for and similarly improved by the application of the present invention.
While the development of expression microarrays allows a greatly expanded overview and assessment of the relative frequency of different gene transcripts in a sample, current methods are limited by significant deficiencies in both quantification and sensitivity [Duggan et al., Nature Genetics, 21: 10-14 (1999); DiRisi et al., Nature Genetics, 14: 457-460 (1996); Rajeevan et al., Jour. Histochem. Cytochem., 47: 337-42 (1999)]. Firstly, quantification is falsely biased since labeling is proportional to probe length, and thus, short genes give less signaling per probe than long genes. Secondly, even long genes provide limited signaling with cDNA chips when compared to the signaling provided by the far longer segments that are typically used for mapping genes to chromosomes or nuclei. In addition, labeling is also limited for expression microarrays because fluorescent compounds, such as Cy3 and Cy5, which are commonly employed for comparative two color labeling, are poorly incorporated by reverse transcriptase. Moreover, current methods are especially limited in sensitivity when individual genes of interest have been down-regulated or are weakly expressed or when the total sample available for study is quite small. In either case, specific or multiple gene transcripts of interest may produce an insufficient number of labeled probes to be detected. Thus, current cDNA chip methods are generally poor or inadequate for detecting specific mRNA transcripts that are expressed in frequencies of less than 10 copies per cell or for analyzing samples comprised of: a) less than 0.5 milligrams of tissue, b) less than 50 micrograms of total RNA, b) less than 0.5 micrograms of poly-A mRNA, or c) less than 5 million cells [Duggan et al., Nature Genetics, 21: 10-14 (1999)]. The conjunction of these deficiencies in both quantification and sensitivity additionally creates further problems. Thus, short genes may falsely appear inactive or weakly expressed relative to longer genes in the same sample, and longer genes will falsely appear to be expressed more abundantly relative to shorter genes. Consequently, more accurate and sensitive detection methods are needed.
One approach to improve chip detection would be to amplify mRNA derived probes by the polymerase chain reaction (PCR) or related enzymatic methods. However, commonly available PCR procedures such as RT-PCR and multiplex PCR, have only been used successfully to amplify a limited number of the gene products in a sample since effective multi-analyte amplification typically requires the provision of at least one unique primer for each type of gene product amplified [Sutcliffe et al., U.S. Pat. No. 5,807,680 (1998)]. In related art such as differential display or other older procedures to explore expression differences, global amplification methods have been employed based upon using simple arbitrary primers, hexamers or various random primer constructs instead of unique primers to amplify DNA or RNA. The inconsistency of such methods, however, have only made them useful for identifying unusual or novel gene expression products, and they have not been devised or employed for use with expression microarrays or DNA chip analyses [Welsh et al., to Nucleic Acids Res., 18: 7213-18 (1990); Pardee et. al., U.S. Pat. No. 5,262,311 (1993) and U.S. Pat. No. 5,665,547 (1997); Liang et al., Nucleic Acids Res., 21: 3269 (1993); Mou et al., Biochem. Biophys. Res. Comm., 199: 564-569 (1994); Villeponteau et al., U.S. Pat. No. 5,580,726, (1996); Silver et. al., U.S. Pat. No. 5,104,792 (1992); Tavtigian et al., U.S. Pat. No. 5,789,206 (1998); Shuber, U.S. Pat. No. 5,882,856 (1999)]. The prime difficulty with many of these methods derives from the use of short arbitrary or random primers that can give variable results with different temperature and hybridization conditions such that they are unsuitable for diagnostic analyses. Even RT-PCR or multiplex PCR methods, which employ unique primers, can produce semi-quantitative rather than quantitative results since different primer sets vary considerably in efficiency and since kinetic factors favor copying the smaller and more abundant products with those methods. Therefore, some products may not amplify well, and rare or down-regulated transcripts may be under-represented [Khan et al., Electrophoresis, 20: 223-9 (1999)]. Additionally, mammalian mRNA samples include very large gene transcripts 6 to 12 thousand nucleotides long that cannot be amplified reliably by routine PCR methods. Consequently, global PCR amplification of a pool of mRNA-derived cDNA probes has not been attempted or successfully accomplished for DNA chip or expression microarray analyses, and based on the above reasons, it has been scientific dogma that exponential amplification methods cannot be validly applied to multi-analyte gene expression arrays. Nonetheless, less robust linear amplification has been developed and employed for chip analyses by adding a RNA polymerase promoter to the end of the poly-T primer used for RT. However, such amplification is incremental and finite, with a typical duplication of 20-60 copies, and the amplified products it produces are antisense RNAs which are degradable [Phillips et. al., Methods, 10: 283-288 (1996); Kondo et al., U.S. Pat. No. 5,972,607; VanGelder et al., U.S. Pat. No. 5,716,785 (1998)]. In related art, Wang et al., [U.S. Pat. No. 5,932,451 (1999)] refined such methods to allow asymmetrical PCR amplification of ds cDNA made from an mRNA sample. However, this amplification method is similarly limited in the number of copies that can reasonably be made from the original sample (68 fold duplication demonstrated). More importantly, by copying full length probes, the signaling bias of current methods cannot be overcome since the number of labels incorporated per probe is a large variable dictated by the transcript size of different genes, and in common mammalian species including humans, transcripts vary from several hundred bases to twelve thousand bases or more. These problems therefore suggested that improved detection might be better achieved by amplifying signaling rather than the target sample.
As described in PCT/US99/16242 (WIPO Publication WO 00-04192), corresponding to U.S. patent application Ser. No. 09/744,097 filed Jan. 16, 2001 entitled “Methods for Detecting and Mapping Genes, Mutations and Variant Polynucleotide Sequences,” which is hereby incorporated by reference herein for all purposes, methods and compositions for modular probe and reporter systems that improve the specific detection of genes and mutations and that amplify signaling were disclosed. These disclosed compositions and methods include:                1. Probe methods, known as WRAP-PROBEs, that are manufactured from synthetic DNAs, from PCR (polymerase chain reaction) products, or from cloning products, wherein the probes have a central, target-specific sequence that is helically wrapped around the target strand, and wherein they have one or more generic linkers at one or both ends that bind one or more reporters. By binding separate reporters to the ends of the probes after coiling the probes around the target, the reporters are more effectively tethered, and they thereby provide far more effective signaling than is achieved with simple labeled probes. Indeed, this method can provide multi-fold signal amplification if dual chains or arrays of long labeled reporters are bound to a short WRAP-PROBE of this configuration. This WRAP-PROBE composition also provides an economic advantage in being able to use generic linkers to interchangeably bind either different reporters to the same probe or different probes to the same reporter, wherein a series of generic reporters may be applied that vary in both the type of signaling and in signaling intensity.        2. Generic reporter methods and compositions such as GENE-TAGs and TINKER-TAGs, these reporters include liner segments of double stranded DNA or chained and joined polynucleotides with single stranded linkers at one or both ends that can join together in arrays and can join to the linkers of WRAP-PROBEs or related probes to provide amplified signaling.        3. DNA-based connectors called Multi-LINKERs, including singular or composite polynucleotide structures that join to the linker of a probe and provide two or more secondary linkers in order to bind multiple reporters to a probe.        
The related WRAP-PROBE methods and compositions are suitable for making targeted probes that amplify signaling and that more efficiently map or detect a specific gene sequence in a variety of detection formats such as in situ gene mapping, dot blots, etc. In those formats, the target or targets are on the substrate and a small number of labeled probes are individually manufactured in excess quantity to find and label those specific targets. The object is simply to put label on the target, thereby mapping or counting the targets. However, those methods are not suited for DNA chip or microarray gene expression formats where the chip substrate is in fact a set of capture probes and where the probes applied to the chip are the true targets of the assay. Thus the object of an expression assay is to determine the relative frequency of the mRNA transcripts in the original tissue sample, and the array is just a device to capture and count a labeled probe set derived from the sample. Thus this probe set must maintain its relative frequencies—accurately representing the thousands of different gene transcripts in the original tissue. Consequently, WRAP-PROBEs for expression array analysis cannot be individually manufactured in the same way as prior WRAP-PROBEs were separately tailored to specific genes.