For genomic studies, the quality and quantity of DNA samples is crucial. High-throughput genetic analysis requires large amounts of template for testing. However, the amount of DNA extracted from individual patient samples, for example, is limited. DNA sample size also limits forensic and paleobiology work. Thus, there has been a concerted effort in developing methods to amplify the entire genome. The goal of whole genome amplification (WGA) is to supply a sufficient amount of genomic sequence for a variety of procedures, as well as long-term storage for future work and archiving of patient samples. There is a clear need to amplify entire genomes in an automatable, robust, representative fashion. Whole genome amplification has historically been accomplished using one of three techniques: polymerase chain reaction (PCR), strand displacement, or cell immortalization.
PCR™
PCR™ is a powerful technique to amplify DNA (Saiki, 1985). This in vitro technique amplifies DNA by repeated thermal denaturation, primer annealing and polymerase extension, thereby amplifying a single target DNA molecule to detectable quantities. PCR™ is not amenable to the amplification of long DNA molecules such as entire chromosomes, which in humans are approximately 108 bases in length. The commonly used polymerase in PCR reactions is Taq polymerase, which cannot amplify regions of DNA larger than about 5000 bases. Moreover, knowledge of the exact nucleotide sequences flanking the amplification target is necessary in order to design primers used in the PCR reaction.
Whole Genome PCR™
Whole genome PCR™ results in the amplification either of complete pools of DNA or of unknown intervening sequences between specific primer binding sites. The amplification of complete pools of DNA, termed known amplification (Lüdecke et al., 1989) or general amplification (Telenius et al., 1992), can be achieved by different means. Common to all approaches is the capability of the PCR™ system to unanimously amplify DNA fragments in the reaction mixture without preference for specific DNA sequences. The structure of primers used for whole genome PCR™ is described as totally degenerate (i.e., all nucleotides are termed N,N=A, T, G, C), partially degenerate (i.e., several nucleotides are termed N) or non-degenerate (i.e., all positions exhibit defined nucleotides).
Whole genome PCR™ involves converting total genomic DNA to a form which can be amplified by PCR (Kinzler and Vogelstein, 1989). In this technique, total genomic DNA is fragmented via shearing or enzymatic digestion with, for instance, a restriction enzyme such as MboI, to an average size of 200-300 base pairs. The ends of the DNA are made blunt by incubation with the Klenow fragment of DNA polymerase. The DNA fragments are ligated to catch linkers consisting of a 20 base pair DNA fragment synthesized in vitro. The catch linkers consist of two phosphorylated oligomers: 5′-GAGTAGAATTCTAATATCTA-3′ (SEQ ID NO:1) and 5′-GAGATATTAGAATTCTACTC-3′ (SEQ ID NO:2). To select against the “catch” linkers that were self-ligated, the ligation product is cleaved with XhoI. Each catch linker has one half of an XhoI site at its termini; therefore, XhoI cleaves catch linkers ligated to themselves but will not cleave catch linkers ligated to most genomic DNA fragments. The linked DNA is in a form that can be amplified by PCR™ using the catch oligomers as primers. The DNA of interest can then be selected via binding to a specific protein or nucleic acid and recovered. The small amount of DNA fragments specifically bound can be amplified using PCR™. The steps of selection and amplification may be repeated as often as necessary to achieve the desired purity. Although 0.5 ng of starting DNA was amplified 5000-fold, Kinzler and Vogelstein (1989) did report a bias toward the amplification of smaller fragments.
Whole Genome PCR™ with Non-Degenerate Primers
Lone Linker PCR™
Because of the inefficiency of the conventional catch linkers due to self-hybridization of two complementary primers, asymmetrical linkers for the primers were designed (Ko et al., 1990). The sequences of the catch linker oligonucleotides (Kinzler and Vogelstein, 1989) were used with the exception of a deleted 3 base pair sequence from the 3′-end of one strand. This “lone-linker” has both a non-palindromic protruding end and a blunt end, thus preventing multimerization of linkers. Moreover, as the orientation of the linker was defined, a single primer was sufficient for amplification. After digestion with a four-base cutting enzyme, the lone linkers were ligated. Lone-linker PCR™ (LL-PCR™) produces fragments ranging from 100 bases to ˜2 kb that were reported to be amplified with similar efficiency.
Interspersed Repetitive Sequence PCR
As used for the general amplification of DNA, interspersed repetitive sequence PCR™ (IRS-PCR™) uses non-degenerate primers that are based on repetitive sequences within the genome. This allows for amplification of segments between suitable positioned repeats and has been used to create human chromosome- and region-specific libraries (Nelson et al., 1989). IRS-PCR™ is also termed Alu element mediated-PCR™ (ALU-PCR™), which uses primers based on the most conserved regions of the Alu repeat family and allows the amplification of fragments flanked by these sequences (Nelson et al., 1989). A major disadvantage of IRS-PCR™ is that abundant repetitive sequences like the Alu family are not uniformly distributed throughout the human genome, but preferentially found in certain areas (e.g., the light bands of human chromosomes) (Korenberg and Rykowski, 1988). Thus, IRS-PCR™ results in a bias toward these regions and a lack of amplification of other, less represented areas. Moreover, this technique is dependent on the knowledge of the presence of abundant repeat families in the genome of interest.
Linker Adapter PCR™
The limitations of IRS-PCR™ are abated to some extent using the linker adapter technique (LA-PCR™) (Lüdecke et al., 1989; Saunders et al., 1989; Kao and Yu, 1991). This technique amplifies unknown restricted DNA fragments with the assistance of ligated duplex oligonucleotides (linker adapters). DNA is commonly digested with a frequently cutting restriction enzyme such as RsaI, yielding fragments that are on average 500 bp in length. After ligation, PCR™ can be performed using primers complementary to the sequence of the adapters. Temperature conditions are selected to enhance annealing specifically to the complementary DNA sequences, which leads to the amplification of unknown sequences situated between the adapters. Post-amplification, the fragments are cloned. There should be little sequence selection bias with LA-PCR™ except on the basis of distance between restriction sites. Methods of LA-PCR™ overcome the hurdles of regional bias and species dependence common to IRS-PCR™. However, LA-PCR™ is technically more challenging than other whole genome amplification (WGA) methods.
A large number of band-specific microdissection libraries of human, mouse, and plant chromosomes have been established using LA-PCR™ (Chang et al., 1992; Wesley et al., 1990; Saunders et al., 1989; Vooijs et al., 1993; Hadano et al., 1991; Miyashita et al., 1994). PCR™ amplification of a microdissected region of a chromosome is conducted by digestion with a restriction enzyme (e.g., Sau3A, MboI) to generate a number of short fragments, which are ligated to linker-adapter oligonucleotides that provide priming sites for PCR™ amplification (Saunders et al., 1989). Two oligonucleotides, a 20-mer and a 24-mer creating a 5′ overhang that was phosphorylated with T4 polynucleotide kinase and complementary to the end generated by the restriction enzyme, were mixed in equimolar amounts and allowed to anneal. Following this amplification, as much as 1 μg of DNA can be amplified from as little as one band dissected from a polytene chromosome (Saunders et al., 1989; Johnson, 1990). Ligation of a linker-adapter to each end of the chromosomal restriction fragment provides the primer-binding site necessary for in vitro semiconservative DNA replication. Other applications of this technology include amplification of one flow-sorted mouse chromosome 11 and use of resulting DNA library as a probe in chromosome painting (Miyashita et al., 1994), and amplification of DNA of a single flow-sorted chromosome (VanDeanter et al., 1994).
A different adapter used in PCR™ is the Vectorette (Riley et al., 1990). This technique is largely used for the isolation of terminal sequences from yeast artificial chromosomes (YAC) (Kleyn et al., 1993; Naylor et al., 1993; Valdes et al., 1994). Vectorette is a synthetic oligonucleotide duplex containing an overhang complementary to the overhang generated by a restriction enzyme. The duplex contains a region of non-complementarity as a primer-binding site. After ligation of digested YACs and a Vectorette unit, amplification is performed between primers identical to Vectorette and primers derived from the yeast vector. Products will only be generated if, in the first PCR™ cycle, synthesis has taken place from the yeast vector primer, thus synthesizing products from the termini of YAC inserts.
Priming Authorizing Random Mismatches PCR™
Another whole genome PCR™ method using non-degenerate primers is Priming Authorizing Random Mismatches-PCR™ (PARM-PCR™), which uses specific primers and unspecific annealing conditions resulting in a random hybridization of primers leading to universal amplification (Milan et al., 1993). Annealing temperatures are reduced to 30° C. for the first two cycles and raised to 60° C. in subsequent cycles to specifically amplify the generated DNA fragments. This method has been used to universally amplify flow sorted porcine chromosomes for identification via fluorescent in situ hybridization (FISH) (Milan et al., 1993). A similar technique was also used to generate chromosome DNA clones from microdissected DNA (Hadano et al., 1991). In this method, a 22-mer primer unique in sequence, which randomly primes and amplifies any target DNA, was utilized. The primer contained recognition sites for three restriction enzymes. Thermocycling was done in three stages: stage one had an annealing temperature of 22° C. for 120 minutes, and stages two and three were conducted under stringent annealing conditions.
Single Cell Comparative Genomic Hybridization
A method allowing the comprehensive analysis of the entire genome on a single cell level has been developed termed single cell comparative genomic hybridization (SCOMP) (Klein et al., 1999; WO 00/17390). Genomic DNA from a single cell is fragmented with a four base cutter, such as MseI, giving an expected average length of 256 bp (44) based on the premise that the four bases are evenly distributed. Ligation mediated PCR™ was utilized to amplify the digested restriction fragments. Briefly, two primers ((5′-AGTGGGATTCCGCATGCTAGT-3′; SEQ ID NO:3); and (5′-TAACTAGCATGC-3′; SEQ ID NO:4)); were annealed to each other to create an adapter with two 5′ overhangs. The 5′ overhang resulting from the shorter oligo is complementary to the ends of the DNA fragments produced by MseI cleavage. The adapter was ligated to the digested fragments using T4 DNA ligase. Only the longer primer was ligated to the DNA fragments as the shorter primer did not have the 5′ phosphate necessary for ligation. Following ligation, the second primer was removed via denaturation, and the first primer remained ligated to the digested DNA fragments. The resulting 5′ overhangs were filled in by the addition of DNA polymerase. The resulting mixture was then amplified by PCR™ using the longer primer.
As this method is reliant on restriction digests to fragment the genomic DNA, it is dependent on the distribution of restriction sites in the DNA. Very small and very long restriction fragments will not be effectively amplified, resulting in a biased amplification. The average fragment length of 256 generated by MseI cleavage will result in a large number of fragments that are too short to amplify.
Whole Genome PCR™ with Degenerate Primers
In order to overcome difficulties associated with many techniques using non-degenerate primers for universal amplification, techniques using partially or totally degenerate primers were developed for universal amplification of minute amounts of DNA.
Degenerate Oligonucleotide Primed PCR
Degenerate oligonucleotide-primed PCR™ (DOP-PCR™) was developed using partially degenerate primers, thus providing a more general amplification technique than IRS-PCR (Wesley et al., 1990; Telenius, 1992). A system was described using non-specific primers (5′-TTGCGGCCGCATTNNNNTTC-3′ (SEQ ID NO:5); showing complete degeneration at positions 4, 5, 6, and 7 from the 3′ end (Wesley et al., 1990). The three specific bases at the 3′ end are statistically expected to hybridize every 64 (43) bases, thus the last seven bases will match due to the partial degeneration of the primer. The first cycles of amplification are conducted at a low annealing temperature (30° C.), allowing sufficient priming to initiate DNA synthesis at frequent intervals along the template. The defined sequence at the 3′ end of the primer tends to separate initiation sites, thus increasing product size. As the PCR product molecules all contain a common specific 5′ sequence, the annealing temperature is raised to 56° C. after the first eight cycles. The system was developed to unspecifically amplify microdissected chromosomal DNA from Drosophila, replacing the microcloning system of Lüdecke et al. (1989) described above.
The term DOP-PCR™ was introduced by Telenius et al. (1992) who developed the method for genome mapping research using flow sorted chromosomes. A single primer is used in DOP-PCR™ as used by Wesley et al. (1990). The primer (5′-CCGACTCGACNNNNNNATGTGG-3′ (SEQ ID NO:6); shows six specific bases on the 3′-end, a degenerate part with 6 bases in the middle and a specific region with a rare restriction site at the 5′-end. Amplification occurs in two stages. Stage one encompasses the low temperature cycles. In the first cycle, the 3′-end of the primers hybridize to multiple sites of the target DNA initiated by the low annealing temperature. In the second cycle, a complementary sequence is generated according to the sequence of the primer. In stage two, primer annealing is performed at a temperature restricting all non-specific hybridization. Up to 10 low temperature cycles are performed to generate sufficient primer binding sites. Up to 40 high temperature cycles are added to specifically amplify the prevailing target fragments.
DOP-PCR™ is based on the principle of priming from short sequences specified by the 3′-end of partially degenerate oligonucleotides used during initial low annealing temperature cycles of the PCR™ protocol. As these short sequences occur frequently, amplification of target DNA proceeds at multiple loci simultaneously. DOP-PCR™ is applicable to the generation of libraries containing high levels of single copy sequences, provided uncontaminated DNA in a substantial amount is obtainable (e.g., flow-sorted chromosomes). This method has been applied to less than one nanogram of starting genomic DNA (Cheung and Nelson, 1996).
Advantages of DOP-PCR™ in comparison to systems of totally degenerate primers are the higher efficiency of amplification, reduced chances for unspecific primer-primer binding and the availability of a restriction site at the 5′ end for further molecular manipulations. However, DOP-PCR™ does not claim to replicate the target DNA in its entirety (Cheung and Nelson, 1996). Moreover, as relatively short products are generated, specific amplification of fragments up to approximately 500 bp in length are produced (Telenius et al., 1992; Cheung and Nelson, 1996; Wells et al., 1999; Sanchez-Cespedes et al., 1998; Cheung et al., 1998).
In light of these limitations, a method has been described that produces long DOP-PCR™ products ranging from 0.5 to 7 kb in size, allowing the amplification of long sequence targets in subsequent PCR (long DOP-PCR™) (Buchanan et al., 2000). However, long DOP-PCR utilizes 200 ng of genomic DNA, which is more DNA than most applications will have available. Subsequently, a method was described that generates long amplification products from picogram quantities of genomic DNA, termed long products from low DNA quantities DOP-PCR™ (LL-DOP-PCR™) (Kittler et al., 2002). This method achieves this by the 3′-5′ exonuclease proofreading activity of DNA polymerase Pwo and an increased annealing and extension time during DOP-PCR™, which are necessary steps to generate longer products. Although an improvement in success rate was demonstrated in comparison with other DOP-PCR™ methods, this method did have a 15.3% failure rate due to complete locus dropout for the majority of the failures and sporadic locus dropout and allele dropout for the remaining genotype failures. There was a significant deviation from random expectations for the occurrence of failures across loci, thus indicating a locus-dependent effect on whole genome coverage.
Sequence Independent PCR™
Another approach using degenerate primers is described by Bohlander et al., (1992), called sequence-independent DNA amplification (SIA). In contrast to DOP-PCR™, SIA incorporates a nested DOP-primer system. The first primer (5′-TGGTAGCTCTTGATCANNNNN-3′ (SEQ ID NO:7); consisted of a five base random 3′-segment and a specific 16 base segment at the 5′ end containing a restriction enzyme site. Stage one of PCR™ starts with 97° C. for denaturation, followed by cooling down to 4° C., causing primers to anneal to multiple random sites, and then heating to 37° C. A T7 DNA polymerase is used. In the second low-temperature cycle, primers anneal to products of the first round. In the second stage of PCR™, a primer (5′-AGAGTTGGTAGCTCTTGATC-3′ (SEQ ID NO:8); is used that contains, at the 3′ end, 15 5′-end bases of primer A. Five cycles are performed with this primer at an intermediate annealing temperature of 42° C. An additional 33 cycles are performed at a specific annealing temperature of 56° C. Products of SIA range from 200 bp to 800 bp.
Primer-Extension Preamplification
Primer-extension preamplification (PEP) is a method that uses totally degenerate primers to achieve universal amplification of the genome (Zhang et al., 1992). PEP uses a random mixture of 15-base fully degenerated oligonucleotides as primers, thus any one of the four possible bases could be present at each position. Theoretically, the primer is composed of a mixture of 4×109 different oligonucleotide sequences. This leads to amplification of DNA sequences from randomly distributed sites. In each of the 50 cycles, the template is first denatured at 92° C. Subsequently, primers are allowed to anneal at a low temperature (37° C.), which is then continuously increased to 55° C. and held for another four minutes for polymerase extension.
A method of improved PEP (I-PEP) was developed to enhance the efficiency of PEP, primarily for the investigation of tumors from tissue sections used in routine pathology to reliably perform multiple microsatellite and sequencing studies with a single or few cells (Dietmaier et al., 1999). I-PEP differs from PEP (Zhang et al., 1992) in cell lysis approaches, improved thermal cycle conditions, and the addition of a higher fidelity polymerase. Specifically, cell lysis is performed in EL buffer, Taq polymerase is mixed with proofreading Pwo polymerase, and an additional elongation step at 68° C. for 30 seconds before the denaturation step at 94° C. was added. This method was more efficient than PEP and DOP-PCR™ in amplification of DNA from one cell and five cells.
Both DOP-PCR™ and PEP have been used successfully as precursors to a variety of genetic tests and assays. These techniques are integral to the fields of forensics and genetic disease diagnosis where DNA quantities are limited. However, neither technique claims to replicate DNA in its entirety (Cheung and Nelson, 1996) or provide complete coverage of particular loci (Paunio et al., 1996). These techniques produce an amplified source for genotyping or marker identification. The products produced by these methods are consistently short (<3 kb) and as such cannot be used in many applications (Telenius et al., 1992). Moreover, numerous tests are required to investigate a few markers or loci.
Tagged PCR™
Tagged PCR™ (T-PCR™) was developed to increase the amplification efficiency of PEP in order to amplify efficiently from small quantities of DNA samples with sizes ranging from 400 bp to 1.6 kb (Grothues et al., 1993). T-PCR™ is a two-step strategy, which uses, for the first few low-stringent cycles, a primer with a constant 17 base pair at the 5′ end and a tagged random primer containing 9 to 15 random bases at the 3′ end. In the first PCR™ step, the tagged random primer is used to generate products with tagged primer sequences at both ends, which is achieved by using a low annealing temperature. The unincorporated primers are then removed and amplification is carried out with a second primer containing only the constant 5′ sequence of the first primer under high-stringency conditions to allow exponential amplification. This method is more labor intensive than other methods due to the requirement for removal of unincorporated degenerate primers, which also can cause the loss of sample material. This is critical when working with subnanogram quantities of DNA template. The unavoidable loss of template during the purification steps could affect the coverage of T-PCR™. Moreover, tagged primers with 12 or more random bases could generate non-specific products resulting from primer-primer extensions or less efficient elimination of these longer primers during the filtration step.
Tagged Random Hexamer Amplification
Based on problems related to T-PCR™, tagged random hexamer amplification (TRHA) was developed on the premise that it would be advantageous to use a tagged random primer with shorter random bases (Wong et al., 1996). In TRHA, the first step is to produce a size distributed population of DNA molecules from a pNL1 plasmid. This was done via a random synthesis reaction using Klenow fragment and random hexamer tagged with T7 primer at the 5′-end (T7-dN6, 5′-GTAATACGACTCACTATAGGGCNNNNNN-3′ (SEQ ID NO:9). Klenow-synthesized molecules (size range 28 bp-<23 kb) were then amplified with T7 primer (5′-GTAATACGACTCACTATAGGGC-3′ (SEQ ID NO:10). Examination of bias indicated that only 76% of the original DNA template was preferentially amplified and represented in the TRHA products.
Strand Displacement
The isothermal technique of rolling circle amplification (RCA) has been developed for amplifying large circular DNA templates such as plasmid and bacteriophage DNA (Dean et al., 2001). Using φ29 DNA polymerase, which synthesizes DNA strands 70 kb in length using random exonuclease-resistant hexamer primers, DNA was amplified in a 30° C. isothermal reaction. Secondary priming events occur on the displaced product DNA strands, resulting in amplification via strand displacement.
In this technique, two sets of primers are used. The right set of primers each have a portion complementary to nucleotide sequences flanking one side of a target nucleotide sequence, and primers in the left set of primers each have a portion complementary to nucleotide sequences flanking the other side of the target nucleotide sequence. The primers in the right set are complementary to one strand of the nucleic acid molecule containing the target nucleotide sequence, and the primers in the left set are complementary to the opposite strand. The 5′ end of primers in both sets is distal to the nucleic acid sequence of interest when the primers are hybridized to the flanking sequences in the nucleic acid molecule. Ideally, each member of each set has a portion complementary to a separate and non-overlapping nucleotide sequence flanking the target nucleotide sequence. Amplification proceeds by replication initiated at each primer and continuing through the target nucleic acid sequence. A key feature of this method is the displacement of intervening primers during replication. Once the nucleic acid strands elongated from the right set of primers reaches the region of the nucleic acid molecule to which the left set of primers hybridizes, and vice versa, another round of priming and replication commences. This allows multiples copies of a nested set of the target nucleic acid sequence to be synthesized.
Multiple Displacement Amplification
The principles of RCA have been extended to WGA in a technique called multiple displacement amplification (MDA) (Dean et al., 2002; U.S. Pat. No. 6,280,949 B1). In this technique, a random set of primers is used to prime a sample of genomic DNA. By selecting a sufficiently large set of primers of random or partially random sequence, the primers in the set will be collectively, and randomly, complementary to nucleic acid sequences distributed throughout nucleic acids in the sample. Amplification proceeds by replication with a highly possessive polymerase, φ29 DNA polymerase, initiating at each primer and continuing until spontaneous termination. Displacement of intervening primers during replication by the polymerase allows multiple overlapping copies of the entire genome to be synthesized.
The use of random primers to universally amplify genomic DNA is based on the assumption that random primers equally prime over the entire genome, thus allowing representative amplification. Although the primers themselves are random, the location of primer hybridization in the genome is not random, as different primers have unique sequences and thus different characteristics (such as different melting temperatures). As random primers do not equally prime everywhere over the entire genome, amplification is not completely representative of the starting material. Such protocols are useful in studying specific loci, but the result of random-primed amplification products is not representative of the starting material (e.g., the entire genome).
Cell Immortalization
Normal human somatic cells have a limited life span and enter senescence after a limited number of cell divisions (Hayflick and Moorhead, 1961; Hayflick 1965; Martin et al., 1970). At senescence, cells are viable but no longer divide. This limit on cell proliferation represents an obstacle to the study of normal human cells, especially since many rounds of cell division are used, as cells are shared between laboratories or to produce large quantities of cells required for biochemical analysis, for genetic manipulations, or for genetic screens. This limitation is of particular concern for the study of rare hereditary human diseases, since the volume of the biological samples collected (biopsies or blood) is usually small and contains a limited number of cells.
The establishment of permanent cell lines is one way to circumvent this lack of critical material. Some tumor cells yield cultures with unlimited growth potential, and in vitro transformation with oncogenes or carcinogens have proven a successful means to establish permanent fibroblast and lymphoblast cell lines. Such cell lines have been valuable in the analysis of mammalian biochemistry and the identification of disease-related genes. However, such transformed cells typically exhibit significant alterations in physiological and biological properties. Most notably, these cells are associated with aneuploidy, spontaneous hypermutability, loss of contact inhibition and alterations in biochemical functions related to cell cycle checkpoints. These cellular properties that differ from their normal counterparts pose significant limitations to the analysis of many cellular functions, in particular those related to genomic integrity and the study of the human chromosome instability syndromes.
Recent advances have shown the onset of replicative senescence to be controlled by the shortening of the telomeres that occurs each time normal human cells divide (Allsopp et al., 1992; Allsopp et al., 1995; Bodnar et al., 1998; Vaziri and Benchimol, 1998). This loss of telomeric DNA is a consequence of the inability of DNA polymerase alpha to fully replicate the ends of linear DNA molecules (Watson, 1972; Olovnikov, 1973). It has been proposed that senescence is induced when the shortest one or two telomeres can no longer be protected by telomere-binding proteins, and thus is recognized as a double-stranded (ds) DNA break. In cells with functional checkpoints, the introduction of dsDNA breaks leads to the activation of p53 and of the p16/pRB checkpoint and to a growth arrest state that mimics senescence (Vaziri and Benchimol, 1996; Di Leonardo et al., 1994; Robles and Adami, 1998). Cell cycle progression in senescent cells is also blocked by the same two mechanisms (Bond et al., 1996; Hara et al., 1996; Shay et al., 1991). This block can be overcome by viral oncogenes, such as SV40 large T antigen, that can inactivate both p53 and pRB. Cells that express SV40 large T antigen escape senescence but continue to lose telomeric repeats during their extended life span. These cells are not yet immortal, and terminal telomere shortening eventually causes the cells to reach a second non-proliferative stage termed ‘crisis’ (Counter et al., 1992; Wright and Shay; 1992). Escape from crisis is a very rare event (1 in 107) usually accompanied by the reactivation of telomerase (Shay et al., 1993).
Telomerase is a specialized cellular reverse transcriptase that can compensate for the erosion of telomeres by synthesizing new telomeric DNA. The activity of telomerase is present in certain germline cells but is repressed during development in most somatic tissues, with the exception of proliferative descendants of stem cells such as those in the skin, intestine and blood (Ulaner and Giudice, 1997; Wright et al., 1996; Yui et al., 1998; Ramirez et al., 1997; Hiyama et al., 1996). The enzyme telomerase is a ribonuclear protein composed of at least two subunits; an integral RNA, that serves as a template for the synthesis of telomeric repeats (hTR), and a protein (hTERT), that has reverse transcriptase activity. The RNA component (hTR) is ubiquitous in human cells, but the presence of the mRNA encoding hTERT is restricted to the cells with telomerase activity. The forced expression of exogenous hTERT in normal human cells is sufficient to produce telomerase activity in these cells and prevent the erosion of telomeres and circumvent the induction of both senescence and crisis (Bodnar et al., 1998; Vaziri and Benchimol, 1998). Recent studies have shown that telomerase can immortalize a variety of cell types. Cells immortalized with hTERT have normal cell cycle controls, functional p53 and pRB checkpoints, are contact inhibited, are anchorage dependent, require growth factors for proliferation, and possess a normal karyotype (Morales et al., 1999; Jiang et al., 1999).
Thus, the related art provides a variety of techniques for whole genome amplification, although there remains a need in the art for methods and compositions amenable to non-biased highthroughput library generation and/or preparation of DNA molecules. For example, Japan Patent No. JP8173164A2 describes a method of preparing DNA by sorting-out PCR™ amplification in the absence of cloning, fragmenting a double-stranded DNA, ligating a known-sequence oligomer to the cut end, and amplifying the resultant DNA fragment with a primer having the sorting-out sequence complementary to the oligomer. The sorting-out sequences consist of a fluorescent label and one to four bases at the 5′ and 3′ termini to amplify the number of copies of the DNA fragment.
U.S. Pat. No. 6,107,023 describes a method of isolating duplex DNA fragments which are unique to one of two fragment mixtures, i.e., fragments which are present in a mixture of duplex DNA fragments derived from a positive source, but absent from a fragment mixture derived from a negative source. In practicing the method, double-strand linkers are attached to each of the fragment mixtures, and the number of fragments in each mixture is amplified by successively repeating the steps of (i) denaturing the fragments to produce single fragment strands; (ii) hybridizing the single strands with a primer whose sequence is complementary to the linker region at one end of each strand, to form strand/primer complexes; and (iii) converting the strand/primer complexes to double-stranded fragments in the presence of polymerase and deoxynucleotides. After the desired fragment amplification is achieved, the two fragment mixtures are denatured, then hybridized under conditions in which the linker regions associated with the two mixtures do not hybridize. DNA species unique to the positive-source mixture, i.e., which are not hybridized with DNA fragment strands from the negative-source mixture, are then selectively isolated.
Patent WO/016545 A1 details a method for amplifying DNA or RNA using a single primer for use as a fingerprinting method. This protocol was designed for the analysis of microbial, bacterial and other complex genomes that are present within samples obtained from organisms containing even more complex genomes, such as animals and plants. The advantage of this procedure for amplifying targeted regions is the structure and sequence of the primer. Specifically, the primer is designed to have very high cytosine and very low guanine content, resulting in a high melting temperature. Furthermore, the primer is designed in such a way as to have a negligible ability to form secondary structure. This results in limited production of primer-dimer artifacts and improves amplification of regions of interest, without a priori knowledge of these regions. In contrast to the current invention, this method is only able to prime a subset of regions within a genome, due to the utilization of a single priming sequence. Furthermore, the structure of the primer contains only a constant priming region, as opposed to a constant amplification region and a variable priming region in the present invention. Thus, a single primer consisting of non-degenerate sequence results in priming of a limited number of areas within the genome, preventing amplification of the whole-genome.
U.S. Pat. No. 6,114,149 regards a method of amplifying a mixture of different-sequence DNA fragments that may be formed from RNA transcription, or derived from genomic single- or double-stranded DNA fragments. The fragments are treated with terminal deoxynucleotide transferase and a selected deoxynucleotide to form a homopolymer tail at the 3′ end of the anti-sense strands, and the sense strands are provided with a common 3′-end sequence. The fragments are mixed with a homopolymer primer that is homologous to the homopolymer tail of the anti-sense strands, and a defined-sequence primer which is homologous to the sense-strand common 3′-end sequence, with repeated cycles of fragment denaturation, annealing, and polymerization, to amplify the fragments. In one embodiment, the defined-sequence and homopolymer primers are the same, i.e., only one primer is used. The primers may contain selected restriction-site sequences to provide directional restriction sites at the ends of the amplified fragments.
U.S. Pat. Nos. 6,124,120 and 6,280,949 describe compositions and a method for amplification of nucleic acid sequences based on multiple strand displacement amplification (MSDA). Amplification takes place not in cycles, but in a continuous, isothermal replication. Two sets of primers are used, a right set and a left set complementary to nucleotide sequences flanking the target nucleotide sequence. Amplification proceeds by replication initiated at each primer and continuation through the target nucleic acid sequence through displacement of intervening primers during replication. This allows multiple copies of a nested set of the target nucleic acid sequence to be synthesized in a short period of time. In another form of the method, referred to as whole genome strand displacement amplification (WGSDA), a random set of primers is used to randomly prime a sample of genomic nucleic acid. In an alternative embodiment, referred to as multiple strand displacement amplification of concatenated DNA (MSDA-CD), fragments of DNA are first concatenated together with linkers. The concatenated DNA is then amplified by strand displacement synthesis with appropriate primers. A random set of primers can be used to randomly prime synthesis of the DNA concatemers in a manner similar to whole genome amplification. Primers complementary to linker sequences can be used to amplify the concatemers. Synthesis proceeds from the linkers through a section of the concatenated DNA to the next linker, and continues beyond. As the linker regions are replicated, new priming sites for DNA synthesis are created. In this way, multiple overlapping copies of the entire concatenated DNA sample can be synthesized in a short time.
U.S. Pat. No. 6,365,375 describes a method for primer extension pre-amplification of DNA with completely random primers in a pre-amplification reaction, and locus-specific primers in a second amplification reaction using two thermostable DNA polymerases, one of which possesses 3′-5′ exonuclease activity. Pre-amplification is performed by 20 to 60 thermal cycles. The method uses a slow transition between the annealing phase and the elongation phase. Two elongation steps are performed: one at a lower temperature and a second at a higher temperature. Using this approach, populations of especially long amplicons are claimed. The specific primers used in the second amplification reaction are identical to a sequence of the target nucleic acid or its complementary sequence. Specific primers used to carry out a nested PCR in a potential third amplification reaction are selected according to the same criteria as the primers used in the second amplification reaction. A claimed advantage of the method is its improved sensitivity to the level of a few cells and increased fidelity of the amplification due to the presence of proof-reading 3′-5′ exonuclease activity, as compared to methods using only one thermostable DNA polymerase, i.e. Taq polymerase.
Bohlander et al. (1992) have developed a method by which microdissected material can be amplified in two initial rounds of DNA synthesis with T7 DNA polymerase using a primer that contains a random five base sequence at its 3′ end and a defined sequence at its 5′ end. The pre-amplified material is then further amplified by PCR using a second primer equivalent to the constant 5′ sequence of the first primer.
Using modification of Bohlander's procedure and DOP-PCR, Guan et al. (1993) were able to increase sensitivity of amplification of microdissected chromosomes using DOP-PCR primers in a cycling pre-amplification reaction with Sequenase version 2 (replenished after each denaturing step by fresh enzyme) followed by PCR amplification with Taq polymerase.
Another modification of the original Bohlander's method has been published in a collection of protocols for DNA preparation in microarray analysis on the World Wide Web by the Department of Biochemistry and Biophysics at the University of California at San Francisco. This protocol has been used to amplify genomic representations of less than 1 ng of DNA. The protocol consists of three sets of enzymatic reactions. In Round A, Sequenase is used to extend primers containing a completely random sequence at its 3′ end and a defined sequence at its 5′ end to generate templates for subsequent PCR. During Round B, the specific primer B is used to amplify the templates previously generated. Finally, Round C consists of additional PCR cycles to incorporate either amino allyl dUTP or cyanine modified nucleotides.
Zheleznaya et al. (1999) developed a method to prepare random DNA fragments in which two cycles are performed with Klenow fragment of DNA polymerase I and primers with random 3′-sequences and a 5′-constant part containing a restriction site. After the first cycle, the DNA is denatured and new Klenow fragment is added. Routine PCR amplification is then performed utilizing the constant primer.
In contrast to other methods in the art, the present invention provides a variety of new ways of preparing DNA templates, particularly for whole genome amplification, and preferentially in a manner representative of a native genome.
RNA Expression Analysis
The expression of genes and regulatory transcripts encoded within DNA is the primary mechanism regulating cellular metabolism. Transcription and the post-transcriptional processing of RNA sets the framework for all phases of cellular function. For proteins that control essential cellular functions, such as replication and differentiation, the levels of RNA expression and protein synthesis are tightly correlated. Changes within the environment of a cell or tissue often result in necessary alterations in cellular functions. For example, a cell may alter the pattern of gene expression in response to environmental factors, such as ligand and metabolite stimulated signaling. Furthermore, cellular expression of RNA and proteins may be altered intentionally as with the use of some therapeutic drugs. These changes in gene expression may be due to both the beneficial and the toxic effects of these drugs. Alterations in gene expression in both the normal or diseased state can be utilized for determining the efficacy and mechanisms of action of potential treatments. In the case of oncogenic transformation, cells may exhibit subtle changes in expression during cancer progression. Changes in gene expression of key proteins involved in cellular transformation have the potential to be used as predictive markers of oncogenesis. The sequencing and mapping of the human genome has resulted in a database of potentially expressed genes. Several tools, including high-density micro-arrays have been developed to measure the expression of each of these genes, including potential splice variants.
Transcribed genes at any given moment in the life of a cell or tissue represent the regulatory and protein-coding responses involved in cellular function. In some embodiments, the present invention relates to the unbiased amplification of sequences representative of the RNA profile. High fidelity amplification of expressed genes from localized tissues, small groups of cells, or a single cell, will allow the analysis of subtle alterations in gene expression. The need to profile a wide range of potentially expressed RNA molecules from limited sample material requires an amplification method that maintains the representation of the starting material. The invention described herein provides a method to produce a large amount of cDNA from amounts of RNA typically recovered in clinical and diagnostic applications that are not sufficient for direct processing. Whole transcriptome amplification has a relatively brief history with methods based primarily on quasi-linear amplification and exponential amplification.
Both transcription based and PCR based methods for amplification of RNA sequences rely on the activity of RNA dependent DNA polymerases such as the various reverse transcriptases of viral origin. It can be argued that regardless of the priming and amplification strategy, sequence specific bias for reverse transcription is unavoidable. This source of bias is addressed in gene profiling experiments by drawing comparisons between similarly amplified control and test samples.
Linear transcription based and single primer amplification (SPA) based methods require an initial reverse transcription step using either random or poly-T priming. To facilitate amplification of the resulting cDNA, primers utilized for reverse transcription may contain a non-complementary tail introducing a specific universal sequence. In the case of in vitro transcription (IVT) based amplification methods, specific binding and initiation sites are introduced as 5′ oligo extensions corresponding to one of the phage RNA polymerase priming and recognition sites (Phillips and Eberwine, 1996; US005514545A). RNA/DNA duplexes resulting from reverse transcription or first strand cDNA synthesis serve as the template for second strand cDNA synthesis after degradation of the RNA strand by RNase H. Second strand cDNA products may be primed randomly or terminally to incorporate the RNA polymerase recognition sites in the tailed primers, thereby generating substrate for linear amplification. Various modifications to the protocol include second strand priming utilizing terminal transferase to extend first strand cDNA products to introduce short stretches of guanine (Wang and Chung; US005932451A), and utilizing the native terminal transferase activity of Moloney murine leukemia virus reverse transcriptase, which has the propensity to add three to five cytosine ribonucleotides to the 3′ terminus of extension products. This activity has been used for second strand priming by Ginsberg and Che (US20030186237A1), and in the “SMART” adaptation (Clontech), wherein a strand-switching adapter is employed, having a series of guanine residues at its 3′ end which can prime the extended poly-C tail (Schmidt et al., 1999).
An alternative to linear amplification by RNA polymerase is “Single Primer Amplification” (SPA), whereby the initial reverse transcriptase incorporated primer sequence designates the binding site for primer annealing in sequential rounds of primer extension with Taq polymerase (Smith et al., 2003). In a specialized version of SPA the reaction is carried out under isothermal conditions whereby the primer consists partially of DNA and partially of RNA. In the presence of stand displacing polymerase activity and RNase H activity, each primer extension product generates substrate for RNase H within the 5′ RNA component of the primer. Cleavage of the extension products generates successive priming sites, and the reaction cycles in a linear strand displacement isothermal mode. (NuGEN Technologies Inc.; WO 02/72772; US2003/0017591 A1; US2003/0017591 A1). Sequential rounds of transcription and reverse transcription are capable of producing as much as a million fold amplification.
PCR based amplification of RNA involves the same initial steps of reverse transcription and second strand synthesis. While those familiar with the art will appreciate the potential to introduce bias upon exponential amplification, several methods have demonstrated the amplified products to have minimal distortion and be highly representative of the original RNA transcripts. The standard method employs double stranded cDNA generated by classical first and second strand synthesis. Briefly, reverse transcriptase initiates from oligo dT and random primers to promote first strand synthesis followed by a cocktail of DNA polymerase I, RNase H and DNA ligase for second strand synthesis and repair. Universal adaptors, containing a known sequence, can then be ligated to the double strand cDNA molecules for subsequent amplification. This process can be substantially improved by avoiding the requirement for ligation mediated adapter ligation through the use of a reverse transcriptase non-template directed addition of cytosine residues. A universal sequence is subsequently introduced as a primer for strand switching mediated second strand cDNA synthesis (Schmidt et al., 1999).
Further improvements aimed at neutralizing bias introduced between samples have been demonstrated using modified primers that contain both universal and unique priming sites. Makrigiorgos et al. (2002) demonstrated the utility of “balanced PCR” using a bipartite primer construction to co-amplify multiple samples that share a common distal primer sequence. The mixture of samples can be co-amplified, minimizing effects of any impurities or other factors affecting the amplification. The pooled samples are subsequently separated based on the individual sequence tags, from their respective proximal primer sequence, in either a secondary low cycle amplification or a primer extension labeling reaction.
Although exponential amplification has the reputation of degrading the relative abundance relationships between transcripts, much of the bias can be attributed to the various steps required in generating the amplimers. The specific sequence of any given transcript may affect the efficiency of reverse transcription, and these effects may be exaggerated as the length of the transcript increases. Methods employing combinations of IVT-based and PCR-based amplification provide both a sensitive and a specific approach, although they retain an intermediate stepwise synthesis of first and second strand cDNA (Rosetta Inpharmatics, Inc. US006271002B1; Roche Diagnostics Co. US20030113754A1).
The present invention minimizes the introduction of bias by capturing transcripts, in a single step, in the form of amplimers with a uniform size distribution. WTA products are synthesized independent of the integrity of the RNA molecule, the ability to complete reverse transcription of the entire RNA molecule, the requirement for template switching during second strand synthesis, and the ligation of adapters. Subsequent amplification of the products using a universal non-self-complementary primer results in unbiased representation suitable for all applications, such as downstream expression studies.