5-methylcytosine (5mC) is a vital epigenetic marker that affects a broad range of biological functions in mammals, including gene expression, maintenance of genome integrity, parental imprinting, X-chromosome inactivation, and regulation of development, aging and cancer (Deaton and Bird, 2011; De Carvalho et al., 2010; Bird, 2002; Jaenisch and Bird, 2003; Goll and Bestor, 2005). Moreover, abnormal methylation of specific gene promoter regions can lead to diseases such as various cancer (Berman et al., 2012; Jones and Baylin, 2002; Esteller, 2007; Feinberg and Tycko, 2004). 5-methylcytosine (5mC) is catalyzed and maintained by a family of DNA methyltransferases (DNMTs) in eukaryotes (Law and Jacobesen, 2010), and constitutes ˜3-6% of the total cytosines in human genomic DNA (Esteller and Aberrant, 2005).
To date, numerous methods have been developed to profile and analyze the global DNA methylation (methylome) in eukaryotes cells (Bock, 2009; Feinberg and Vogelstein, 1983; Beck and Rakyan, 2008). Current technologies for detecting DNA methylation are generally of two types. In the first type, DNA fragments containing 5mC are enriched using affinity-based capture, including the use of 5-methycytosine-binding proteins (MBD-Seq) and antibody-based approaches (e.g. methylated DNA immunoprecipitation, MeDIP-seq). In the other type, denatured DNA is treated with sodium bisulfite, such that non-modified cytosine is converted to uracil, while methylated cytosine is left intact, allowing for base-resolution detection of cytosine methylation. In recent years, the study of 5mC has been facilitated by the development of whole genome bisulfite sequencing methods that can resolve the genomic location of methylcytosine at single-base resolution (Cokus et al., 2008; Lister et al., 2008; Lister et al., 2009).
However, the recent discovery that 5mC can be iteratively oxidized to 5-hydroxymethyl (5hmC), 5-formyl (5fC), and 5-carboxylcytosine (5caC) (He et al., 2011; Ito et al., 2011) requires reevaluation of the specificity of various approaches for each type of modified cytosine. Indeed, sodium bisulfite treatment, the previously held “gold standard” for DNA methylation analyses, cannot distinguish 5mC from 5hmC (Huang et al., 2010; Jin et al., 2010), but does allow for deamination of 5caC. Thus, methods relying on sodium bisulfite treatment, such as whole genome bisulfite sequencing (MethylC-Seq), reduced representation bisulfite sequencing (RRBS), and array-based approaches, generate maps of both 5mC and 5hmC, rather than 5mC specifically. As a result, further technology development is needed in order to allow proper interpretation of the signals produced by such methods.
All of these approaches have additional limitations: the bisulfite conversion-based methods (e.g. reduced representation bisulfite sequencing, RRBS) are typically associated with high costs and cannot distinguish between 5mC and recently discovered 5-hydroxylmethycytosine (5hmC) (Meissner et al., 2008; Harris et al., 2010); array-based approaches (e.g. Illumina's Infinium assay) provide low genome coverage (˜0.1%) (Weisenberger et al., 2008; Beck, 2010). Moreover, affinity-based methods, such as MBD and MeDIP, can be specific for 5mC but cannot supply information on hypomethylated CpG and non-CpG methylation regions (Jacinto et al., 2008; Bock et al., 2010).
Therefore, alternative methods and compositions for detecting and evaluating 5mC in the genome of eukaryotic organisms are desirable.
In 2009, the presence of an oxidized 5mC, 5-hydroxymethylcytosine (5hmC), has been discovered in embryonic and neuronal stem cells, certain adult brain cells, and some cancer cells. 5hmC was discovered as another relatively abundant form of cytosine modification in embryonic stem cells (ESCs) and Purkinje neurons (Kriaucionis and Heintz, 2009; Tahiliani et al., 2009). It has been widely accepted that 5hmC is another player of epidenetic regulation and potential disease marker.
The TET proteins, which are responsible for conversion of 5mC to 5hmC, have been shown to function in ESC regulation, myelopoiesis and zygote development (Dawlaty et al., 2011; Gu et al., 2011; Iqbal et al., 2011; Ito et al., 2010; Ko et al., 2010; Koh et al., 2011; Wossidlo et al., 2011). 5hmC was found to be widespread in many tissues and cell types, although with diverse levels of abundance (Globisch et al., 2010; Munzel et al., 2010; Song et al., 2011; Szwagierczak et al., 2010). Proteins that can recognize 5hmC-containing DNA have also been investigated (Frauer et al., 2011; Yildirim et al., 2011). In addition, 5hmC can be further oxidized to 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) by TET proteins (He et al., 2011; Ito et al., 2011; Pfaffeneder et al., 2011), and demethylation pathways through these modified cytosines have been shown (Cortellino et al., 2011; Guo et al., 2011; He et al., 2011; Maiti and Drohat, 2011; Zhang et al., 2012). Together, these studies provide an emerging paradigm in which 5mC oxidation plays important roles in sculpting a cell's epigenetic landscape and developmental potential through the regulation of dynamic DNA methylation states.
Strategies to selectively label and/or enrich 5hmC in genomic DNA have been developed to investigate the distribution and function of 5hmC in the genome (Pastor et al., 2011; Robertson et al., 2012; Robertson et al., 2011; Song et al., 2011), which also include 5hmC immunoprecipitation (hMeDIP) by employing antibodies (Ficz et al., 2011; Stroud et al., 2011; Williams et al., 2011; Wu et al., 2011; Xu et al., 2011). While 5hmC is more highly enriched in gene bodies than transcription starting sites in mouse cerebellum (Song et al., 2011; Szulwach et al., 2011b), all genome-wide maps of 5hmC in human and mouse embryonic stem cells indicate that 5hmC tends to exist in gene bodies, promoters, and enhancers (Ficz et al., 2011; Pastor et al., 2011; Stroud et al., 2011; Szulwach et al., 2011a; Williams et al., 2011; Wu et al., 2011; Xu et al., 2011). However, in all cases, the resolution of these maps was restricted by the size of the immunoprecipitated or chemically captured DNA, which varied from several hundred to over a thousand bases.
Since current bisulfite sequencing methods cannot distinguish between 5mC and 5hmC (Huang et al., 2010; Jin et al., 2010), the genome-wide bisulfite sequencing maps generated in recent years may not accurately capture the true abundance of 5mC at each base in the genome. A more detailed understanding of the function of 5hmC as well as 5mC has, therefore, been hampered by the lack of a single-base resolution sequencing technology capable of detecting the relative abundance of 5hmC per cytosine.
Therefore, there is also a need for methods and compositions for detecting and evaluating 5hmC in a nucleic acid molecule as well as in the genome of eukaryotic organisms.