The degree of differentiation or physiological state of a cell, a tissue or an organism is characterized by a specific expression status, i.e., the degree of transcriptional activation of all genes or particular groups of genes. The molecular basis for numerous biological processes that result in a change in this state is the coordinated transcriptional activation or inactivation of particular genes or groups of genes in a cell, an organ or an organism. Characterization of this expression status is of key importance for answering many biological questions. Changes in gene expression in response to a stimulus, a developmental stage, a pathological state or a physiological state are important in determining the nature and mechanism of the change and in finding cures that could reverse a pathological condition. Patterns of gene expression are also expected to be useful in the diagnosis of pathological conditions, and for example, may provide a basis for the subclassification of functionally different subtypes of cancerous conditions.
Several methods that can analyze the expression status of genes are known in the art. Differential display RT-PCR™ (DDRT) is one method for analyzing differential gene expression in which subpopulations of complementary DNA (cDNA) are generated by reverse transcription of mRNA by using a cDNA primer with a 3′ extension (preferably two bases). Random 10 base primers are then used to generate PCR™ products of transcript-specific lengths. If the number of primer combinations used is large enough, it is statistically possible to detect almost all transcripts present in any given sample. PCR™ products obtained from two or more samples are then electrophoresed next to one another on a gel and differences in expression are directly compared. Differentially expressed bands can be cut out of the gel, reamplified and cloned for filter analysis.
It is possible to enrich the PCR™ (polymerase chain reaction) amplification products for a particular subgroup of all mRNA molecules, e.g., members of a particular gene family, by using one primer which has a sequence specific for a gene family in combination with one of the 10 base random primers. This technique of DDRT is described (Liang and Pardee, 1992; Liang et al., 1993; Bauer et al., 1993; Stone and Wharton, 1994; Wang and Feuerstein, 1995; WO 93/18176; and DE 43 17 414).
There are a number of disadvantages to the experimental design of DDRT. The differential banding patterns are often only poorly reproducible. Due to the design of the primers even the use of longer random primers of, e.g., 20 bases in length does not satisfactorily solve the problem of reproducibility (Ito et al., 1994). In order to evaluate a significant portion of differentially expressed genes, a large number of primer combinations must be used and multiple replicates of each study must be done. The method often results in a high proportion of false positive results and rare transcripts cannot be detected in many DDRT studies (Bertioli et al., 1995.)
Due to the non-stringent PCR™ conditions and the use of only one arbitrary primer further analysis by sequencing is necessary to identify the gene. Sequencing of selected bands is problematic since the same primer often flanks DDRT products at both ends so that direct sequencing is not possible and an additional cloning step is necessary. Due to the use of short primers, a further reamplification step with primer molecules extended on the 5′ side is necessary even if two different primers flank the product. Finally, due to the use of random primers, it is never quite possible to be sure that the primer combinations recognize all transcripts of a cell. This applies, even when using a high number of primers, to studies which are intended to detect the entirety of all transcripts as well as to studies which are directed towards the analysis of a subpopulation of transcripts such as a gene family (Bertioli et al., 1995).
A variant of DDRT, known as GeneCalling, has been described (Shimkets et al., 1999) which addresses some of these problems. In this method, multiple pairs of restriction endonucleases are used to prepare specific fragments of a cDNA population prior to amplification with pairs of universal primers. This improves the reproducibility of the measurements and the false positive rate, but the patterns are very complex and identification of individual transcripts requires the synthesis of a unique oligonucleotide for each gene to be tested. In addition, the quantitative data obtained are apparently significant only for changes above 4-fold (Shimkets et al., 1999) and only a weak correlation with other techniques is obtained. The ability of the technique to distinguish the gene-specific band from the complex background for any arbitrarily chosen gene has not been documented (Shimkets et al., 1999).
AFLP based mRNA fingerprinting further addresses some of the deficiencies of DDRT. AFLP allows for the systematic comparison of the differential expression of genes between RNA samples (Habu, 1997). The technique involves the endonuclease digestion of immobilized cDNA by a single restriction enzyme. The digested fragments are then ligated with a linker specific for the restriction cut site. The tailed fragments are subsequently amplified by PCR™ employing primers complementary to the linkers added to the digest with the addition of variable nucleotides at the 3′ end of the primers. The products of the amplification are visualized by PAGE and banding patterns compared to reveal differences in RNA transcription patterns between samples. Although based RNA fingerprinting provides an indication of the RNA message present in a given sample, it fails to restrict the potential number of signals produced by each individual RNA strand. With this technique, each RNA strand may potentially produce multiple fragments and therefore multiple signals upon amplification. This failure to restrict the number of signals from each message complicates the results that must be evaluated.
Song and Osborn (1994) describe a method for examining the expression of homologous genes in plant polyploids in which the techniques of RT-PCR™ and RFLP (restriction fragment length polymorphism) analysis are combined with one another. In this method a cDNA is produced from RNA by reverse transcription, then amplified by using two gene-specific primers. The amplification products are transcript-specifically shortened by endonuclease cleavage, separated by electrophoresis according to their length, cloned, and then analyzed by sequencing. This method has the disadvantage of low sensitivity, as a cloning step is necessary to characterize the expression products. A further disadvantage of this method is that gene specific sequence information must be available on at least two regions within the analyzed genes in order to design suitable primers.
In principle, gene expression data for a particular biological sample could be obtained by large-scale sequencing of a cDNA library. The role of sequencing cDNA, generated by reverse transcription from mRNA, has been debated for its value in the human genome project. Proponents of genomic sequencing have argued the difficulty of finding every mRNA expressed in all tissues, cell types, and developmental stages. It is also believed that cDNA libraries do not provide all sequences corresponding to structural and regulatory polypeptides (Putney et al., 1983). In addition, libraries of cDNA may be dominated by repetitive elements, mitochondrial genes, ribosomal RNA genes, and other nuclear genes comprising common or housekeeping sequences. While some mRNAs are abundant, others are rare, resulting in cellular quantities of mRNA from various genes that can vary by several orders of magnitude. Therefore, sequencing of transcribed regions of the genome using cDNA libraries has been considered unsatisfactory.
Techniques based on cDNA subtraction or differential display can be used to compare gene expression patterns between two cell types (Hedrick et al., 1984; Liang and Pardee, 1992), but provide only a partial analysis, with no quantitative information regarding the abundance of messenger RNA. Expressed sequence tags (ESTs) have been valuable for gene discovery (Adams et al., 1993; Okubo et al., 1992), but like Northern blotting, RNase protection, and reverse transcriptase-polymerase chain reaction (RT-PCR™) analysis (Alwine et al., 1977; Zinn et al, 1983; Veres et al., 1987) the approach only evaluates a limited number of genes at a time.
In Chen et al. (2001), amplified differential gene expression (ADGE) is used to quadratically amplify the ratio of a gene in two samples before displaying them. This amplification does not alter the ratio of expression of genes that are expressed at the same level, but quadratically increases the ratio for those with different expression levels. It is used to reveal gene expression profiles between two samples and may be used to perform global analysis. The technique requires hybridization and addition of separate adaptors to the tester and driver cDNAs. Jiang et al. (2000) describe Rapid Subtraction Hybridization (RaSH) as involving enzymatically digesting cDNA into small fragments, ligating to adaptors, PCR amplifying and then incubating with tester and driver PCR fragments. The key component of this technique is subtractive hybridization, and multiple fragments are recovered from a single cDNA species. Reciprocal subtraction differential RNA display (RSDD) combines reciprocal subtraction of cDNA libraries followed by differential RNA display (Kang, et al. 1998). The approach results in the enrichment of unique sequences and reduction of common sequences. All of these techniques require cloning and sequencing to identify differences in gene expression.
Serial analysis of gene expression (SAGE) (U.S. Pat. No. 5,866,330; Kinzler et al., 1995) was developed for global gene expression analysis. It is based on the use of short (i.e. 9–10 base pair) nucleotide sequence tags that identify a defined position in an mRNA and are used to ascertain the identity of the corresponding transcript and gene. The cDNA tags are generated from mRNA samples, randomly paired, concatenated, cloned, and sequenced. While this method allows the analysis of a large number of transcripts, the identification of individual genes requires sequencing of tens of thousands of tags for comparison of even a small number of samples. Although SAGE provides a comprehensive picture of gene expression, it cannot be specifically directed at a small subset of the transcriptome (Zhang et al., 1997; Velculescu et al., 1995). Data on the most abundant transcripts is the easiest and fastest to obtain, while about a megabase of sequencing data is needed for confident analysis of low abundance transcripts.
Microarray technology utilizes hybridization of cDNAs or mRNAs to microarrays containing hundreds or thousands of individual cDNA fragments or oligonucleotides specific for particular genes or ESTs. The matrix for hybridization is either a DNA chip, a slide or a membrane. This method can be used to direct a search towards specific subsets of genes, but cannot be used to identify novel genes. In addition, arrays are expensive to produce (DeRisi et al., 1996; Schena et al., 1995). For those methods using cDNA arrays, a library of individually cloned DNA fragments must be maintained with at least one clone for each gene to be analyzed. Because much of the expense of utilizing microarrays lies in maintaining the fragment libraries and programming equipment to construct the microarray, it is only cost-efficient to produce large numbers of identical arrays. Data interpretation between experiments and laboratories have been problematic as data derived from arrayed elements are not directly comparable (Lakhani and Ashworth, 2001). Either SAGE and microarray technologies lack the flexibility to easily change the subset of the transcriptome being analyzed or to focus on smaller subsets of genes for more detailed analyses. Hybridization methods are also limited by lack of detection of genes not represented in ESTs.
Kornmann et al. (2001) describes amplification of double-stranded cDNA end restriction fragments (ADDER). cDNA is synthesized using an oligo dT containing two restriction sites and a biotin moiety. The 3′ most cDNA fragment from each gene is recovered by digestion with a 4-base recognition restriction enzyme and recovered using SA-magnetic beads. An adaptor is ligated to the 5′ end of the restriction fragment. The fragment is released from the oligo dT by restriction with AscI. A master cDNA stock is generated using universal primers. Differential display touchdown PCR is then carried out with 16 upstream and 12 downstream primers. The process uses radioisotopes and sequencing gels. The number of PCR products generated per reaction is greater than methods described herein, which makes results more difficult to interpret. In addition, no software exists to predict or aid in identification of the PCR product. Therefore, each differentially expressed PCR product must be cloned and sequenced.
As described above, current techniques for analysis of gene expression either monitor one gene at a time, are designed for the simultaneous and therefore more laborious analysis of thousands of genes or do not adequately restrict the signal to message ratio. There is a need for improved methods which encompass both rapid, detailed analysis of global expression patterns of genes as well as expression patterns of defined sets of genes for the investigation of a variety of biological applications. This is particularly true for establishing changes in the pattern of gene expression in the same cell type, for example, in different developmental stages, under different physiologic or pathologic conditions, when treated with different pharmaceuticals, mutagens, carcinogens, etc. Identification of differential patterns of expression has several utilities, including the identification of appropriate therapeutic targets, candidate genes for gene therapy (including gene replacement), tissue typing, forensic identification, mapping locations of disease-associated genes, and for the identification of diagnostic and prognostic indicator genes.
U.S. Pat. No. 6,221,600 and Wang et al. (2001) describe a combinatorial oligonucleotide PCR method for global gene expression, wherein a cDNA gives rise to no more than one fragment in a collection of products, which is subsequently amplified and therefore representative of each expressed gene. In these methods, artifactual amplification of multiple fragments from the same cDNA can occur during PCR by priming with a single primer.
The object of the present invention is to provide a method for gene expression analysis which exceeds the capabilities of the state of the art. Thus, the present invention described herein provides novel improvements to the art of gene expression analysis, particularly using combinatorial oligonucleotide polymerase chain reaction with labeled linkers and amplification of restriction fragments comprising nonidentical ends.