Post-transcriptional covalent modifications of DNA are important epigenetic factors in mammalian development and disease (Goll & Bestor, Annu. Rev. Biochem. 2005, 74, 481-514). The best known DNA modification is methylation of cytosine residues at the C5 position (5mC) which occurs predominantly in the context of CG dinucleotides in all vertebrates including humans (Rottach et al., J. Cell. Biochem. 2009, 108: 43-51). Recent studies of genomic DNA from the human brain, neurons and from mouse embryonic stem cells provided evidence that CG sequences also contain 5-hydroxymethylcytosine (hmC) (Tahiliani et al., 2009, Science, 324: 930-935; Kriaucionis & Heintz, 2009, Science, 324: 929-930). Increasing evidence (Ito et. al., Nature 2010, 466: 1129-1133) suggests that hmC may also play important epigenetic roles in embryonic development, brain function and cancer progression. In particular, elevated levels of 5-hydroxymethyluracil (hmU), a deaminated version of hmC in DNA, were reported to correlate with incidents of breast cancer (Djuric et al., Cancer,1996, 77, 691-696). Glucosylated forms of 5-hydroxymethylated bases in certain bacteriophages and an African trypanosome serves to protect the invading genome against host defense systems (Gommers-Ampt and Borst, FASEB J, 1995, 9, 1034-1042). Bacterial and archaeal organisms contain genomic N6-methyladenine and N4-methylcytosine along with 5mC. These methylated bases are also involved in species-specific control of genetic exchange as well as regulation of important genes related to pathogenicity.
Numerous techniques have been developed for the identification and localization of 5mC in DNA (Schumacher et al., Nucleic Acids Res, 2006, 34: 528-542). Most of the analytical approaches of the latter group can be divided into two major types: bisulfite conversion-based techniques, and non-covalent affinity binding-based techniques (i.e. immunoprecipitation). The gold standard method to study the genomic localization of individual 5mC residues is bisulfite sequencing and its numerous modifications. This method is based on bisulfite-mediated deamination of C to U; 5mC and hmC residues are inert to this reaction, and therefore standard sequencing of bisulfite-converted DNA shows the modified residues in the C-track, whereas T and C residues appear in the T-track (Frommer et al., Proc Natl Acad Sci USA, 1992, 89: 1827-1831) (Hayatsu & Shiragami, Biochemistry,1979, 18: 632-637; Huang et al., PLoS One, 2010, 5: e8888.). The method provides the highest mapping resolution (single nucleotide), but suffers from the following shortcomings:
1) conversion of the tetranucleotide sequences into tri-nucleotides DNA sequences often precludes unequivocal assignment of sequence reads to genomic loci; and
2) the procedure is tedious, labor-intensive and prone to experimental artefacts.
Among the affinity-based techniques, MeDIP and MethylCap are the most widely used. MeDIP uses an antibody that is specific for 5-methylcytosine to retrieve methylated fragments from sonicated genomic DNA (Weber et al., Nat Genet, 2005, 37: 853-862). MethylCap employs a methyl-binding domain protein to obtain methylated DNA fractions (Rauch & Pfeifer, Lab. Invest. 2005, 85: 1172-1180). Antibodies against hmC have been produced which non-covalently bind hmC-containing DNA fragments (Ito, S. et al., Nature, 2010, 466: 1129-1133; Meissner et al., Nat. Biotech. 2010, 28:1079-1088). All these techniques permit enrichment of modified cytosine-containing fragments from pools of genomic DNA fragments for further analysis using DNA sequencing or hybridization to DNA microarrays. A major limitation of these approaches is their low resolution which is defined by the minimal size of a DNA fragment that can be amplified using PCR (typically 200-500 base pairs).
Another group of methods to study DNA modification use covalent tagging of target sites. Genomic fragments containing unmodified methyltransferase sites can be selectively labelled and separated from modified fragments using mTAG (Lukinavicius et al. J. Am. Chem. Soc. 2007, 129, 2758-2759, EP1874790) or similar approaches (EP1102781, EP1756313, U.S. Pat. No. 7,465,544). Analysis of hmC residues can be similarly accomplished using methyltransferase-directed derivatization and labelling (WO2010115846; WO2010115847; Liutkeviciute et al., Angew. Chem. Int. Ed., 2011, 50, 2090-2093) or using glucosyltransferases for transfer of derivatized sugars (Song et al., Nat. Biotechnol. 2010, 29, 68-72; Pastor et al., Nature 2011, 473, 394-397) followed by covalent labelling with reporters such as biotin. These techniques permit enrichment of labelled fragments from pools of genomic DNA fragments for further analysis using DNA sequencing or hybridization to DNA microarrays. As mentioned above, the resolution is again defined by the minimal size of a DNA fragment that can be amplified using PCR (typically 200-500 base pairs).
It is an aim of the present invention to solve one or more of the problems with the prior art described above and to provide further methods of sequence analysis at modification sites of nucleic acids.