As a result of advances in automated sequencing technology, much work has been carried out on determining coding regions of DNA resulting in the full sequencing of many animal genomes including the human genome. It has been realised for many years that the majority of genomic DNA, however, is non-coding and this materiel was once considered as “junk” DNA. Analysis of the non-coding regions of DNA is now being considered as important in the study of gene expression and function. Methylation states or patterns in nucleic acid, particularly genomic DNA, is thought to have a functional or regulatory role in gene expression and control in animals.
It has been demonstrated that, in single stranded DNA, sodium bisulphite preferentially deaminates cytosine to uracil, compared to a very slow rate of deamination of 5-methylcytosine to thymine (Shapiro, R., DiFate, V., and Welcher, M, (1974) J. Am. Chem. Soc. 96: 906-912). This observation served as the basis for the development of the bisulphite genomic sequencing protocol of Frommer et al 1992 (Frommer M, McDonald L E, Millar D S, Collis C M, Watt F, Grigg G W, Molloy P L and Paul C L. PNAS 89: 1827-1831 (1992)). In summary, this method as presently practiced involves the following general steps: alkaline denaturation of DNA; deamination using sodium bisulphite; desulphonation by desalting followed by sodium hydroxide treatment; neutralization and desalting.
One of the major disadvantages of the bisulphite modification procedure and the established variation thereof is that it has been shown that the procedure results in the degradation of between 84-96% of the original input DNA (Grunau et al. Nucleic Acids Research 29 (13) e65, (2001)). The high loss associated with the procedure means that practically it is very difficult to successfully analyse small numbers of cells for their genomic methylation status, or successfully analyse ancient archival specimens in which the DNA is already in a partially degraded state. In addition, due to inherent nucleic acid degradation of the current methods, it is not possible to sequence and assemble the complete genome of an organism to determine its genome-wide methylation profile in the same manner as has been successfully applied by the public Human Genome Project (International Human Genome Sequencing Consortium, 2001, Nature, 409, 860-921) or the private CELERA sequencing project (J Craig Venter et al., 2001, Science, 291, 1304-1351) owing to the huge number of “gaps” in the sequence.
A further disadvantage with the bisulphite method as presently practiced is that, in general, only small fragments of DNA can be amplified. Experience shows that generally less than about 500 base pairs (bp) can be successfully treated and amplified. The present technique is not applicable to new molecular biological methods such as Long Distance polymerase chain reaction (PCR) which has made it possible to amplify large regions of untreated genomic DNA, generally up to about 50 kb. At present, it is not even possible to analyse the methylation status of intact genes, as a large number of genes in mammalian genomes exceed 50 kb in length.
Thermostable polymerases in widespread use are unable to bypass the abasic sites generated during the bisulphite conversion and generally this causes stalling of the amplification reaction (Sikorsky, J. A., Primerano, D. A., Fenger, T. W. and Denvir, J. (2004) Biochem. Biophys. Res. Commun. 323, 823-230). In addition, these polymerases are also unable to successfully and efficiently amplify DNA which contains bulky adducts such as sulphonate groups. This necessitates desulphonation of the bisulphite converted nucleic acid at high temperatures in an alkaline medium prior to PCR amplification and results in the majority of the nucleic acid damage and lass seen during this procedure (Munson, K., Clark, J., Lamparska-Kupsik, K. and Smith, S. S. (2007) Nucl. Acids. Res. 35(9), 2893-2903). Furthermore, the generation of effectively a T-rich 3 base genome (as non-methylated C's are converted to U's and then into Ts during PCR amplification, giving rise to a genome comprised predominantly of bases A, T, G) results in significant difficulties for currently available polymerases and causes frequent slippage during extension. An additional problem encountered during PCR amplification of bisulphite converted DNA is that the single stranded template contains uracil which some polymerases, such as the archaebacterial DNA polymerases like Pfu Pwo and Vent, are unable to process (Lasken, R. S., Schuster, D. M. and Rashtchian, A. (1996) J. Biol. Chem. 271 (30), 17692-17696). Currently, therefore, in order to investigate the methylation status of even relatively small genes (<4 kb), PCR reactions have had to be staggered across the gene region of interest (D. S Millar, K. K Ow, C. L. Paul, P. J. Russell, P. L. Molloy, S. J. Clarke, 1999, Oncogene, 18(6):1313-24; Millar D S, Paul C L, Molloy P L, Clarke S J. (2000). J Biol Chem; 275(32):24893-9).
In some instances it is desirable to bisulphite modify RNA prior to reverse transcription into cDNA and subsequent PCR. However, RNA is even more sensitive to degradation at the high temperatures and pH required for desulphonation and this results in a further reduction in sensitivity. There is a need for enzymes that are capable of efficiently processing bisulphite modified, treated or converted nucleic acids.