Polymerase chain reaction, PCR, has been used for more than two decades to create multiple copies of a segment of the original template DNA, and new applications and modifications emerge every day.
For example U.S. Pat. No. 4,683,202 is one of the earliest patent documents disclosing the PCR method. It describes a process for amplifying at least one specific nucleic acid sequence contained in a nucleic acid or a mixture of nucleic acids wherein each nucleic acid consists of two separate complementary strands, of equal or unequal length, which process comprises: (a) treating the strands with two oligonucleotide primers, for each different specific sequence being amplified, under conditions such that for each different sequence being amplified an extension product of each primer is synthesized which is complementary to each nucleic acid strand, wherein said primers are selected so as to be sufficiently complementary to different strands of each specific sequence to hybridize therewith such that the extension product synthesized from one primer, when it is separated from its complement, can serve as a template for synthesis of the extension product of the other primer; (b) separating the primer extension products from the templates on which they were synthesized to produce single-stranded molecules; and (c) treating the single-stranded molecules generated from step (b) with the primers of step (a) under conditions that a primer extension product is synthesized using each of the single strands produced in step (b) as a template. Since then the method was further developed, for example as described in U.S. Pat. No. 4,800,159 and U.S. Pat. No. 4,965,188, but the basic principal is well known to one skilled in the art.
Also thermal cyclers for carrying out the PCR methods are well-known in the art. For example U.S. Pat. No. 5,038,852 discloses a basic PCR device comprising a heat conducting container for holding a reaction mixture, means for heating, cooling, and maintaining said container to or at any of a plurality of predetermined (user-defined) temperatures and having an input for receiving a control signal controlling which of said predetermined temperatures at or to which said container is heated, cooled, or maintained; and a computer means, coupled to the input of said means for heating and cooling to generate the proper control signals to control the temperature levels, temperature rate-of-change ramps, and timing of the incubations at certain temperature levels.
Although it seems that PCR methods exist for every application, there still remain certain templates PCR cannot handle. Amplification of long fragments was difficult until it was found that DNA polymerases with proof-reading activity could improve long-PCR amplifications (Barnes, W. M. 1994, “PCR amplification of up to 35-kb DNA with high fidelity and high yield from lambda bacteriophage templates”, Proceedings of the National Academy of Sciences of the United States of America, vol. 91, no. 6, pp. 2216-2220; Mukai, H. & Nakagawa, T. 1996, “Long and accurate PCR (LA PCR)”, Nippon rinsho. Japanese Journal of Clinical Medicine, vol. 54, no. 4, pp. 917-92). However, the amplification efficiency of these enzymes is relatively poor compared with standard non-proof-reading polymerases. For this reason several companies brought into market mixtures of proof-reading and more processive traditional DNA polymerases designed for amplification of long and difficult fragments.
One of the problems people performing PCR amplifications still face very often is how to amplify over CG-rich DNA containing repetitive sequences forming strong secondary structures. Examples of such structures include CG, CTG and GCC repeats. When such structures are formed, there will be only partial extension as it is assumed that the DNA polymerase collides with the double stranded secondary structures. This results in incomplete extension and poor overall amplification efficiency. The incomplete extension relates also to another commonly known phenomenon. If stopped on the repetitive region the partially extended new DNA strand has in its 3′ end a stretch of the repetitive sequence. It is released in the next denaturation step, and the 3′ end can anneal into any part of the repeat, right or wrong position, and it will be extended in the next extension step. Due to this misalignment and many other reasons related to the experimental conditions, such as DNA polymerase or template DNA concentrations etc. the fragments end up being different in length, which can be seen as a typical smear on the agarose gel.
Disease-causing repeat instability is an important and unique form of mutation that is linked to more than 40 neurological, neurodegenerative and neuromuscular disorders. These repeats consist of multiple, often dozens or hundreds, copies of short, typically less than 10 nucleotides long, repeat units. DNA repeat expansion mutations are dynamic and ongoing within tissues and across generations. The patterns of inherited and tissue-specific instability are determined by both gene-specific cis-elements and trans-acting DNA metabolic proteins. Repeat instability probably involves the formation of unusual DNA structures during DNA replication, repair and recombination. Experimental advances towards explaining the mechanisms of repeat instability have broadened our understanding of this mutational process. They have revealed surprising ways in which metabolic pathways can drive or protect from repeat instability.
Numerous common inherited diseases are caused by expansion of CG-rich repeat sequences (Mirkin, S. M. 2007, “Expandable DNA repeats and human disease”, Nature, vol. 447, no. 21, pp. 932-940; Mirkin, S. M. 2006, “DNA structures, repeat expansions and human hereditary disorders”, Current Opinion in Structural Biology, vol. 16, no. 3, pp. 351-358). The secondary structures formed in these extended CG-rich repeats have been considered as a major disease mechanism.
The secondary structures and the difficulty to remain in singe stranded state due to the high melting temperature of a CG-rich fragment are the major obstacles blocking DNA polymerase from extending during primer extension. This results in inefficient primer extension and poor amplification efficiency.
The secondary structures are often formed as a result of the self complementary DNA strands searching for their minimum structural energy states. If a secondary structure can be taken as an energy state minimum, one would expect that in fixed conditions the energy minimum would be the same for every molecule of the amplicon and the molecules would finally end up in a similar secondary structure.
However, in long and repetitive fragments the process is more complicated and a one-and-only end structure is not very likely, but rather multiple structures with very similar minimum energy states are found.
Diagnostic analysis of the length of the repeat expansion can be done with many methods. If the expected repeat expansions are relatively short, extension over these repeats is possible. Amplification of these repeat sequences followed by fragment is a routine procedure in diagnostic laboratories. However, in many of the diseases the repeat expansion is too long and/or CG-rich for current PCR methods.
Reliable identification of one copy of an expanded CG-rich repeat causing dominantly inherited diseases or found in unaffected carriers of a recessively inherited disease is especially difficult because of the presence of one copy of unexpanded, short repeat allele. In suboptimal PCR conditions this short wild type allele has much higher amplification efficiency and it often monopolizes the amplification reaction, resulting poor amplification of the expanded allele and a diagnostic error. Inability to reliably amplify over long CG-rich segments has forced diagnostic laboratories to use other technologies, e.g. Southern blotting, to analyze these repeat expansions.
Certain methods have been developed to overcome the problem of amplifying GC-rich regions. Various additives, co-solvents, including DMSO, glycerol and Betaine, have been used to lower the high melting temperature of the CG-rich segments (Henke, W., Herdel, K., Jung, K., Schnorr, D. & Loening, S. A. 1997, “Betaine improves the PCR amplification of GC-rich DNA sequences”, Nucleic acids research, vol. 25, no. 19, pp. 3957-3958; Hubé, F., Reverdiau, P., lochmann, S. & Gruel, Y. 2005, “Improved PCR method for amplification of GC-rich DNA sequences”, Molecular biotechnology, vol. 31, no. 1, pp. 81-84).
In some methods an analogue of dGTP is used. For example U.S. Pat. No. 5,091,310 discloses a method for structure-independent amplification of DNA by the polymerase chain reaction, said method comprising: (a) treating the DNA under hybridizing conditions with a pair of oligonucleotide primers, a DNA polymerase, dATP, dCTP, TTP, and c7dGTP such that an extension product of each oligonucleotide primer is formed that is complementary to the DNA, wherein the extension product of a first primer of said primer pair, when separated from its template, can serve as a template for synthesis of the extension product of a second primer of said pair; (b) separating the extension products from the templates on which the extension products were synthesized; and (c) repeating steps (a) and (b) on the extension products produced in step (b). In spite of these additives, long, repetitive and/or CG-rich fragments have often remained “un-PCRable”.
U.S. Pat. No. 6,355,422 B1 discloses a method wherein two different constant extension temperatures are used (Liu, Q. & Sommer, S. S. 1998, “Subcycling-PCR for multiplex long-distance amplification of regions with high and low GC content: application to the inversion hotspot in the factor VIII gene”, BioTechniques, vol. 25, no. 6, pp. 1022-1028). The authors describe a PCR method for amplification of a large duplication showing GC-rich and CG-poor segments. Because the segment had regions with low GC-content, they used lowered extension temperature of 60° C. together with the more conventional 65° C. A single extension step contained two 2-minutes sessions in both temperatures.
Although human genome is sequenced, the large-scale sequencing projects are frequently struggling with inefficient amplification over areas with CG-rich repetitive segments. This problem is even more pronounced when genomes from other species than human having higher CG-content are studied. There is a need for a PCR method which can overcome this problem. Such a PCR method would be valuable also for example for the diagnosing of diseases and disorders related to such sequences, such as diseases described above. Efficient primer extension over CG-rich sequences would also allow reliable DNA sequencing over CG-rich sequences.
The present invention is based on the surprising discovery that turning PCR reaction from classical PCR amplification using constant denaturation, primer annealing and primer extension temperatures (the last two steps can be combined in 2-step PCR) into a more dynamic process considerably improved the amplification efficiency over the CG-rich repetitive sequences.
Slow, progressive elevation to a much higher primer extension temperatures than are conventionally used resulted in extension over very CG-rich repeat. This, however, was not sufficient to allow efficient amplification over long self complementary CG-rich repeats forming secondary structures.
It appeared that the extension over long CG-rich repeats could be accomplished if the DNA strand was not allowed to settle into a fixed secondary structure, but rather was kept in transition by continuously changing the extension temperature in a pulsate manner. The changing temperature forces the secondary structures to continuously change, and even the strongest secondary structures would eventually, and at least temporarily, open, allowing DNA polymerase to extend a step further. In proper experimental conditions the newly synthesized extending DNA strand maintains its position relative to the template strand although the secondary structures are forced to open. The pulsation also gives DNA polymerase significantly more time to perform the extension at an optimal temperature and the probability of reaching complete extension increases.
The original reason for developing the method of the invention, also called as Heat Push PCR, originated from the need for novel methods for analysis of inherited diseases caused by extensions of CG-rich repeat sequences.
One of the most challenging for PCR amplification has been the diagnostic testing of the Progressive myoclonus epilepsy 1, also known as EPM1 disorder. The major mutation found in Finnish patients is an expansion of a dodecamer repeat (CCCCGCCCCGCG [SEQ ID NO: 1]) in the 51 untranslated region of cystatin B (CSTB) gene. The normal alleles usually contain two or three copies of the dodecamer repeat, while the expanded mutant alleles have been reported to contain between 30 and 80 copies. This expansion impairs the promoter function and in homozygous individuals results in the lack of CSTB expression and a severe disease phenotype. EPM1 is recessively inherited; thus affected individuals have two expanded alleles, while unaffected mutation carriers have only one expanded allele. The dodecamere repeat of EPM1 is not symmetric or self complementary, suggesting that it would not make as strong secondary structures as do symmetrical repeats. Thus, difficulties in PCR amplification over the EPM1 expansion could be more related to its extremely high CG-content (nearly 1 kb only C or G nucleotides) than to strong secondary structures.
Dystrophia Myotonica, type 1 (DM1) is an inherited disease caused by an expansion of a CTG repeat in the promoter region of the Dystrophic myotonica protein kinase (DMPK) gene. As in dominantly inherited repeat expansion diseases only one allele is expanded. Unaffected individuals carry 5-34 repeat units while affected patients show more than 50, sometimes over 2000 repeat units. In congenital form of DM1, a large repeat expansion of uniform size is usually found. The expansions are easily detected in Southern blotting assay. The diagnostics of the adult form of DM1 is, on the other hand, sometimes complicated by high cellular length variability of the expanded repeats. If the individual length variability is large, instead of a single band, a smear and/or multiple bands are seen in Southern blotting, severely decreasing the signal to noise ratio of the assay.
Fragile X syndrome (FRAXA) is caused by the expansion of a CGG repeat in the 5′ untranslated region of the X chromosomal Fragile site mental retardation 1 (FMR1) gene. The repeat length between 50 and 200 is considered as pre-mutation, an expansion more than 200 repeat units long is considered as a full mutation.
Anticipation, the further expansion of the slightly expanded alleles in next generation is a common problem of inherited diseases caused by repeat expansions. This makes it important also to identify the asymptomatic carriers of the slightly expanded alleles (Pearson C E, Nichol E K, Cleary J D: Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet. 2005 October; 6(10):729-42).
The tissue heterogeneity of repeat lengths, as well as cellular mosaicisms severely hamper the detection of expanded alleles. Sometimes only a fraction of the cells in the sample carry large expansions, and in those cases amplification of the short wild type allele may completely prevent expanded fragments from amplifying when conventional methods are used.
Dystrophic Myotonia, type 1, DM1 was chosen as the primary model system for the method of the present invention for the following reasons:                Being a dominantly inherited disease, the affected patients have one unexpanded allele and one expanded allele.        The CTG repeat expansion can be over 1000 repeat units long and in a sample one can have multiple variable expanded fragments.        No PCR protocols exist for efficient amplification over long repeat expansions.        Southern blotting analysis allowed direct estimation of the repeat expansion length and its variability in the original samples. Thus, amplification efficiency could be estimated not only between an expanded and an unexpanded allele, but also between multiple expanded fragments differing slightly in size.        
To demonstrate the robustness of the method of the present invention, another, clinically very important inherited disease, the Fragile X syndrome, was chosen. The CGG repeat forms strong secondary structures and is considered one of the most difficult fragments to amplify.