Developing methods to detect molecular alterations in biological samples is key to increasing our knowledge about the causes of diseases, the processes of cellular development and differentiation, and other physiological and cellular events, and in developing tools to detect, treat, alter, and monitor these conditions. Perhaps the most significant alteration that can occur in a cell is in its pattern of gene transcription, which exerts profound control on protein levels and activities. Thus, the detection of changes in mRNA levels in the thousands of genes expressed by a single cell is an important goal for many research programs.
With the extensive amount of cDNA sequence information available through the efforts of genome sequencing projects, as well as those of thousands of individual laboratories, it is becoming increasingly imperative to develop technologies that can utilize this information to study the patterns of gene expression in both development and disease. Most human cancers, for example, are the result of genetic changes that result in alterations in the profile of expressed genes within a cell. Methods that can rapidly and accurately measure the expression levels of thousands of genes will play an essential role in furthering our understanding of the causes and nature of progression of human cancers, detecting and monitoring cancers and others diseases, and identifying and developing treatment methods for the diseases.
Several approaches have been developed in recent years in an attempt to achieve reliable, economical measurement of patterns and levels of gene expression. These include sequencing-based methods such as expressed sequence tag (EST) databases (See e.g., Adams et al., Nature Genetics 4, 373 [19931]) and SAGE (See e.g., Velculescu et al., Science 270, 484 [1995]), PCR based methods such as differential display (See e.g., Liang et al., Cancer Res. 52, 6966 [1992]; and Liang and Pardee, Science 257, 967 [1992]), and methods based on hybridization to microarrays of EST clones or oligonucleotides (See e.g., Chee et al., Science 274, 610 [1996]; DeRisi et al., Nat. Genet. 14, 457 [1996]; Gress et al., Oncogene 13, 1819 [1996]; Maskos and Southern, Nucleic Acids Res. 21, 4663 [1993]; Pietu et al., Genome Res. 6, 492 [1996]; Schena et al., Science 270, 467 [1995]; and Schena et al., Proc. Natl. Acad. Sci. 93, 10614 [1996]) or by subtractive hybridization (See e.g., Diatchenko et al., Proc. Natl. Acad. Sci. 6025 [1996]). The strengths and weaknesses of each of these technologies is assessed below.
Partial sequencing of randomly selected cDNA clones directly from cDNA libraries (i.e., producing expressed sequence tags--ESTs) has been used as a means of identifying new genes and analyzing the expression pattern of tissues and cell lines (See e.g., Adams et al., Science 252, 1651 [1991]). In these methods, total mRNA is reverse transcribed to produce cDNA. The cDNA are hybridized to random primers and sequenced (typically with automated sequencers), with ESTs of longer than 150 bp providing the best data for comparison to sequence databases. The sequence information can be compared to available sequence databases to characterize the cDNA as being derived from a known or novel gene. However, sequencing ESTs is very labor intensive, time consuming, and expensive. As a means of monitoring gene expression, the value of the data depends on the extent to which sequence information is already available (i.e., the method may indicate that a previously identified gene is expressed in a given tissue but will not provide information about the expression of related genes that have yet to be identified and catalogued).
Serial analysis of gene expression (SAGE) provides another sequencing-based method to characterizes expression patterns (Velculescu et al., supra). In the SAGE technique, RNA is reverse transcribed to produce cDNA copies of the transcripts. The cDNA is then cleaved with a restriction enzyme that cuts each transcript at least once. The 3' portion of the restriction products (containing the poly-A tail) are isolated using streptavidin beads. The samples are divided into two portions and the free restriction ends are ligated to one of two linkers containing a type IIS restriction site. IIS restriction enzymes cleave at a defined distance from their recognition sites (i.e., as opposed to cleaving directly at the recognition site). The linkers are designed to produce IIS cleavage products that contain only a short piece (i.e., the tag) of the original cDNA, ligated to the linker. Blunt ends are produced and the two pools are ligated together creating a "ditag" with the two types of linkers on either end and the short cDNA tags in the center. The ditags are then PCR amplified using primers that are complementary to sequence within the two linkers. The PCR products are then cloned and manually sequenced, before comparing to sequence databases or SAGE experiments from other samples. Although SAGE provides a means to compare gene expression patterns, its dependance on cloning and sequencing make it labor intensive. Furthermore, SAGE does not allow the study of specific genes or gene families, but instead screens all expressed transcripts.
A PCR-based approach for identifying gene expression differences between samples is the differential display of mRNAs using arbitrarily primed polymerase chain reaction (DDRT-PCR). The polymerase chain reaction is described by Mullis et al., in U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188, hereby incorporated by reference. Briefly, the PCR process consists of introducing a molar excess of two oligonucleotide primers to the cDNA mixture containing the desired target sequence (e.g., a poly-T primer that hybridizes to the poly-A tail of mRNAs and a random oligomer). The two primers are complementary to their respective strands of the double-stranded sequence. The mixture is denatured and then allowed to hybridize. Following hybridization, the primers are extended with a thermostable DNA polyerase so as to from complementary strands. The steps of denaturation, hybridization, and polymerase extension can be repeated as often as needed to obtain a relatively high concentration of a segment of the desired target sequence.
In the case of DDRT-PCR, the target is mRNA; the mRNA is, however, treated with reverse transcriptase in the presence of oligo(dT) primers to make cDNA prior to the PCR process. The PCR is carried out with random primers in combination with the oligo(dT) primer used for cDNA synthesis. In theory, since only cDNA (i.e., derived from mRNA) is amplified, only the expressed genes are amplified. Where two samples are to be compared, the amplified products are placed in side-by-side lanes of a gel; following electrophoresis, the products can be compared or "differentially displayed."
DDTR-PCR has a number of drawbacks. The use of arbitrary random primers can cause faint banding at essentially every position of the gel, and there is usually a high level of false positives (See e.g, Bauer et al., PCR Methods and Applications, Cold Spring Harbor Lab. Press, Plainview, N.Y., Supplement, pp. S97-S108 [1994]). Also, the process is generally biased toward high-copy number genes (See e.g., Bertioli et al., Nucleic Acids Res. 23, 4520 [1995]) and is often inappropriate for experiments where only a few genes vary in expression (See e.g., Sompayac et al., Nucleic Acids Res. 23, 4738 [1995]). Lastly, practitioners often complain about difficulties in reproducing banding patterns.
There have been some attempts to remedy these problems. For example, E. Haag et al. (Biotechniques 17, 226 [1994]) describes an improved DDRT-PCR method, whereby the use of the standard oligo-dT primer in the PCR step is omitted to decrease the faint banding at essentially every position of the electrophoresis gel. Instead, a second arbitrary primer was utilized in PCR. Another example is O. C. Ikonomov el al. (Biotechiques 20, 1030 [1996]), describing the use of a modified DDRT-PCR protocol to increase bias towards moderate to low abundance transcripts. The authors utilized experimentally selected primer pairs directed at known coding sequences that avoid amplification of highly abundant ribosomal and mitochondrial transcripts. While such efforts have improved DDRT-PCR, the process remains unsatisfactory because of the continued amplification of material that is not of interest.
A significant disadvantage to current DDRT-PCR techniques is the laborious steps required to characterized differentially expressed samples. Interesting bands must be excised from the gel and cloned. Such cloning is non-trivial, as PCR products tend to have a single adenosine residue overhang, requiring processing before cloning into traditional vectors, or requiring cloning into T-vectors (i.e., vectors containing a single thymine overlap), which is inefficient. Following cloning, the insert must be sequenced and compared to nucleic acid data bases to determine its identity or novelty. Such cloning and sequencing is time consuming, labor intensive, and expensive.
Other approaches to analyze gene expression patterns involve technologies utilizing high-density DNA arrays placed on computer chips. The technology is being applied to the study of gene expression, genetic linkage, and genetic variability. For example, Chee et al. (Chee et al., supra) describe the use of DNA arrays on computer chips to simultaneously analyze the entire human mitochondrial genome. Arrays containing 135,000 probes, representing the entire human mitochondrial genome, were generated on chips. Within minutes, experimental DNA was hybridized to the chip to detect sequence polymorphisms with single-base resolution. Although the method is accurate and efficient, testing is limited by the generation of the DNA arrays. Each time a new system is to be tested, an array must be generated, an extremely time consuming, technically complex, and expensive process.
Several subtractive hybridization methods have also been used to characterize cDNA levels to identify differences between biological samples. In one methods, the so-called "subtractive cDNA library" method (See generally, Ausubel et al, Current Protocols in Molecular Biology, Section 5.8.9 [1990]), a subracted cDNA library is generated or obtained (ATCC or Stratagene) containing cDNA clones corresponding to mRNAs present in one sample and not present in another (e.g., present in a particular species, tissue, or cell and present in another species, tissue or cell). In the protocol, cDNA containing the gene(s) of interest ("+cDNA") is prepared with restriction enzymes ends and the cDNA not containing the gene(s) of interest ("-cDNA") is prepared with blunt ends. The +cDNA is mixed with a 50-fold excess of -cDNA inserts and the mixture is heated to make the DNA single-stranded. Thereafter, the mixture is cooled to allow for hybridization. Annealed cDNA inserts are ligated to a vector and transfected. In theory, the only +cDNA likely to be double-stranded with a the restriction enzyme sites at each end are those not hybridized to something in the -cDNA preparation (i.e., where a complementary sequence is in the -cDNA preparation, the sequence will not be transfected). Thus, only sequences unique to the +cDNA preparation will be cloned and amplified.
There are several significant disadvantages to this technique. First, it can be very tedious. For example, if no "+" and "-" cDNA libraries are available for the samples to be studied, they must first be made or cDNA must be sythesized, requiring extra days or weeks. Even when cDNA libraries are available, the protocol still requires several days. Second, library production with small amounts of cDNA is technically very difficult. Also, since relatively few recombinants are obtained after subtraction, this protocol is only effective when using library vectors that allow high cloning efficiency. Third, clones containing reiterated sequences (e.g., an Alu repeat in the 3' untranslated region) are eliminated from the library, misrepresenting the presence of clones containing such sequences.
Some of the disadvantages of the subtractive cDNA library techniques have been overcome using a PCR-based "supression subtractive hybridization" technique (Diatchenko et al., supra). The method is used to selectively amplify target cDNA fragments and simultaneously suppress nontarget DNA amplification, overcoming the problem of differences in mRNA abundance. The method eliminates the need to physically separate single and double stranded molecules. However, as this method still requires "+" and "-" cDNA samples, it remains a tedious procedure if samples are not available. A major drawback of all subtractive hybridization methods is the need to clone and sequence desired fragments in order to identify and characterize them.
What is needed is an inexpensive, easy to use, time efficient, and reliable method for distinguishing between the expression of genes in two or more biological samples. Such a method should also promote followup analysis once a gene of interest is identified. Ideally, such analysis would avoid time consuming steps such as cloning and sequencing.