In many fields of research such as genetic diagnosis, cancer research or forensic medicine, the scarcity of genomic DNA can be a severely limiting factor on the type and quantity of genetic tests that can be performed on a sample. One approach designed to overcome this problem is whole genome amplification. The objective is to amplify a limited DNA sample in a non-specific manner in order to generate a new sample that is indistinguishable from the original but with a higher DNA concentration. The aim of a typical whole genome amplification technique would be to amplify a sample up to a microgram level while respecting the original sequence representation.
The first whole genome amplification methods were described in 1992, and were based on the principles of the polymerase chain reaction. Zhang and coworkers (Zhang, L., et al. Proc. Natl. Acad. Sci. USA, 1992, 89: 5847-5851) developed the primer extension PCR technique (PEP) and Telenius and collaborators (Telenius et al., Genomics. 1992, 13(3):718-25) designed the degenerate oligonucleotide-primed PCR method (DOP-PCR) Zhang et al., 1992).
PEP involves a high number of PCR cycles; using Taq polymerase and 15 base random primers that anneal at a low stringency temperature. Although the PEP protocol has been improved in different ways, it still results in incomplete genome coverage, failing to amplify certain sequences such as repeats. Failure to prime and amplify regions containing repeats may lead to incomplete representation of a whole genome because consistent primer coverage across the length of the genome provides for optimal representation of the genome. This method also has limited efficiency on very small samples (such as single cells). Moreover, the use of Taq polymerase implies that the maximal product length is about 3 kb.
DOP-PCR is a method which uses Taq polymerase and semi-degenerate oligonucleotides (such as CGACTCGAGNNNNATGTGG (SEQ ID NO: 1), for example, where N=A, T, C or G) that bind at a low annealing temperature at approximately one million sites within the human genome. The first cycles are followed by a large number of cycles with a higher annealing temperature, allowing only for the amplification of the fragments that were tagged in the first step. This leads to incomplete representation of a whole genome. DOP-PCR generates, like PEP, fragments that are in average 400-500 bp, with a maximum size of 3 kb, although fragments up to 10 kb have been reported. On the other hand, as noted for PEP, a low input of genomic DNA (less than 1 ng) decreases the fidelity and the genome coverage (Kittler et al., Anal. Biochem. 2002, 300(2), 237-44).
Multiple displacement amplification (MDA, also known as strand displacement amplification; SDA) is a non-PCR-based isothermal method based on the annealing of random hexamers to denatured DNA, followed by strand-displacement synthesis at constant temperature (Blanco et al., 1989, J. Biol. Chem. 264:8935-40). It has been applied to small genomic DNA samples, leading to the synthesis of high molecular weight DNA with limited sequence representation bias (Lizardi et al., Nature Genetics 1998, 19, 225-232; Dean et al., Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 5261-5266). As DNA is synthesized by strand displacement, a gradually increasing number of priming events occur, forming a network of hyper-branched DNA structures. The reaction can be catalyzed by the Phi29 DNA polymerase or by the large fragment of the Bst DNA polymerase. The Phi29 DNA polymerase possesses a proofreading activity resulting in error rates 100 times lower than the Taq polymerase.
The methods described above generally produce amplification of whole genomes wherein all of the nucleic acid in a given sample is indiscriminately amplified. These methods cannot selectively amplify target genomes in the presence of background or contaminating genomes. Therefore, the results obtained from these methods have a problematically high amount of contaminating background nucleic acid. Purifying collected samples to isolate target genome(s) and remove background genome(s) will result in a further reduction in the amount of already scarce target genome.
There is a long felt need for a method of targeted amplification of a whole genome relative to background or contaminating genomes. In certain cases where only small quantities of a nucleic acid sample to be tested for the presence of a given target nucleic acid sequence, it would be advantageous to introduce specificity into amplification of whole genomes so that a particular target genome is selectively amplified relative to other genomes present within a given sample. For example, in cases of microbial forensics or clinical diagnostics, it would be useful to selectively amplify a genome of a pathogen, or a class of pathogens relative to the genomes of organisms which are also present in the sample which contains a small quantity of total nucleic acid. This would provide the quantities of nucleic acid of the pathogen that are necessary to identify the pathogen. The methods disclosed herein satisfy this long felt need.