Besides the four major nucleobases (C, A, G and T), DNA of most living organisms contains minor amounts of their methylated variants: 5-methylcytosine (5 mC), N4-methylcytosine and N6-methyladenine. These methylated species are formed by DNA methyltransferase enzymes (MTases), which catalyze the transfer of a methyl group from the cofactor S-adenosyl-L-methionine (AdoMet) to form the above methylated nucleotides at specific positions of their target sequences (Cheng, (1995) Annu. Rev. Biophys. Biomol. Struct. 24, 293-318). It is well established that DNA methylation is an important biological mechanism that regulates gene expression in vertebrate animals including humans (Bird, A. (2002) Genes Dev. 16, 6-21), Goll, M. G. & Bestor, T. H. Annu. Rev. Biochem. 74, 481-514 (2005) and serves as a species self-code in bacteria. Genomic DNA sometimes contains 5-hydroxymethylated pyrimidine nucleobases 5-hydroxymethylcytosine and 5-hydroxymethyluracil (hmC and hmU) (Gommers-Ampt, J. H. & Borst, P. (1995) FASEB J. 9, 1034-1042).
Glucosylation of the 5-hydroxymethyl groups in certain bacteriophages and an African trypanosome serves to protect the invading genome against host defense systems. The presence of hmC was previously reported in DNA from animal brains (Penn et al., (1972) Biochem. J. 126, 781-790). Recent studies of genomic DNA from human neurons and brains (Kriaucionis, S. & Heintz, N. Science 324, 929-930), as well as DNA from mouse embryonic stem cells (Tahiliani, M. et al. Science 324, 930-935) found that hmC residues occur at CG sequences and that they are likely produced by oxidation of mC residues. The 5-hydroxymethyl groups in DNA may alter interactions with cellular proteins involved in epigenetic control of gene activity (Valinluck, V. et al. (2004) Nucleic Acids Res. 32, 4100-4108), whereas elevated levels of hmU in DNA were reported to correlate with incidents of breast cancer (Djuric, Z. et al. (1996) Cancer 77, 691-696). Altogether, the above evidence suggests that 5-hydroxymethylate nucleobases, and hmC in particular, may play important roles in embryonic development, brain function and cancer progression. However, neither the chromosomal localization of hmC residues nor the underlying biological mechanisms are currently known, and further studies are required to address these fundamental issues. Most importantly, such studies are hampered by the lack of adequate analytical techniques that would permit facile analysis of hmC residues in DNA.
Current analytical techniques to study cytosine modifications in mammalian DNA are based on the existence of two epigenetic states of cytosine in CG sites: unmodified cytosine (C) and 5-methylated cytosine (5 mC). To this end, numerous techniques have been developed for the identification and localization of 5 mC in DNA (Schumacher et al. (2006) Nucleic Acids Res. 34, 528-542). The gold standard method to study the genomic localization of individual 5 mC residues is bisulfite sequencing (Frommer et al. (1992) Proc Natl Acad Sci USA 89, 1827-1831) and its numerous modifications. This method is based on bisulfite-mediated deamination of C to U; 5 mC residues are inert to this reaction, and therefore standard sequencing of bisulfite-converted DNA shows 5 mC residues in the C-track, whereas T and C residues—in the T-track. If treated with bisulfite, hmC is converted to cytosine 5-methylsulfonate, which is deaminated at an even slower rate than 5 mC (Hayatsu, M. & Shiragami, M. (1979) Biochemistry 18, 632), and should thus appear in the C-track. Therefore, hmC residues cannot be distinguished from mC residues using conventional bisulfite sequencing protocols. Similarly, other high-throughput genome-wide analysis techniques such as mDiP (methylated DNA immunoprecipitation) (Weber et al. (2005) Hum Mol Genet 14, R11-R18), which is based on binding m5C-containing DNA fragments to 5 mC-specific antibodies, or methods based on using methylation sensitive restriction endonucleases, are not suitable for detection of hmC residues either. Therefore, as all the existing techniques were designed to distinguish only the two alternate states of cytosine (methylated versus unmodified) (Schumacher et al. (2006) Nucleic Acids Res. 34, 528-542), they have none or poor ability to determine hmC residues in genomic DNA.
Recently, a method for DNA derivatization using non-cofactor reactions of DNA methyltransferases has been disclosed (patent application LT2009023 filed 2 Apr. 2009). This technique permits methyltransferase-directed sequence-specific covalent coupling of formaldehyde (or other aliphatic aldehydes) to the C5-position of their target cytosine residues in DNA, thereby producing 5-hydroxymethylated (or 5-hydroxyalkylated-) cytosines. The application also describes methods for subsequent sequence-specific covalent derivatization of hmC residues in various types of DNA molecules by methyltransferase-directed coupling of nucleophilic compounds, including thiols. The latter reaction in principle permits derivatization of hmC residues in DNA with various functional and reporter groups provided that they occur at a target position for the directing MTase. Since hmC residues are known to occur at CG sequences in genomic DNA of vertebrate animals including humans, some of the derivatization reactions may be useful in developing the required techniques for analysis of hmC in DNA. However, these derivatization reactions have not been assessed with respect to their suitability for chemical manipulation and analysis of hmC residues in various types of DNAs, including mammalian genomic DNA.
In conclusion, it is obvious that new reliable and validated methods are required for analysis of hmC resides in genomic DNA.