In the field of epigenetics, the study of DNA methylation and demethylation is one of the most important subjects. Hypermethylation in gene control region usually leads to silence of downstream genes, whereas demethylation process is usually accompanied by activation of the expression of downstream genes, thereby participating in corresponding biological process. In mammals, DNA demethylation process is achieved by TET (Ten-Eleven Translocation) family proteins-mediated oxidation of 5-methylcytosine (5mC), to gradually produce 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC), and by base excision repair pathway (Mamta Tahiliani, et al., Science, 2009, 324:931-935; Skirmantas Kriaucionis and Nathaniel Heintz, Science, 2009, 324:929-930; Toni Pfaffeneder, et al., Angewandte Chemie International Edition, 2011, 123: 7146-7150; Shinsuke Ito, et al., Science, 2011, 333:1300-1303; Yufei He, et al., Science, 2011, 333:1303-1307.).
An important premise on studying the biological function of such epigenetic bases is to know its distribution region in genome and specific sequence information. Bisulfite sequencing method is a well-known method for DNA methylation analysis, and can identify the sequence information of 5mC at single-base resolution. Normal cytosine C is converted to uracil by sodium bisulfate treatment, and is read as T by Polymerase Chain Reaction (PCR) amplification and sequencing. However, 5mC is still read as C in the process of Polymerase Chain Reaction amplification and sequencing due to the presence of 5-methyl with electron-donating effect, which results in the process of sodium bisulfite treatment being difficult to occur.
5hmC, 5fC and 5caC, as modified bases capable of being stably present in genome, may also have particular biological functions. It is thus ascertained that the genomic distributions of these three cytosine derivatives are very important information for exploring their functions. However, the presence of 5hmC, 5fC and 5caC results in the bisulfite sequencing being more complicate. In normal bisulfite sequencing, 5hmC is read as C, and both 5fC and 5caC are read as T (Michael J. Booth, et al., Science, 2012, 336: 934-937.). Therefore, there is a need for developing a new sequencing technique at single base resolution to identify the positions of these new modified bases. With the development of detection techniques and sequencing methods for 5hmC (Chunxiao Song, et al., Cell, 2011, 153:678-691; Adam B. Robertson, et al., Nucleic Acids Research, 2011, 39:e55; William A. Pastor, et al., Nature, 2011, 473:394-397; Chunxiao Song, et al., Nature Methods, 2012, 9:75-77; Michael J. Booth, et al., Science, 2012, 336:934-937; Miao Yu, et al., Cell, 2012, 149:1368-1380.), the biological function of 5hmC is already known to some extent. Although corresponding detection methods for 5fC and 5caC were explored (Eun-Ang Raiber, et al., Genome Biology, 2012, 13:R69; Li Shen, et al., Cell, 2013, 153:692-706; Chunxiao Song, et al., Cell, 2013, 153:678-691; Michael J. Booth, et al., Nature Chemistry, 2014, 6:435-440.), it is still immature in detecting the sequence distribution with low cost while achieving high-throughput and single-base resolution. Therefore, the studies on 5fC and 5caC are relatively retarded.
Currently, the studies on 5-formylcytosine related chemical reactions mainly focus on 5-formyl group on the cytosine ring. The researchers designed a reaction with respect to the formyl group of 5fC on the basis that formyl group can react with the amino of hydroxylamine compound and generate oxime (Shinsuke Ito, et al., Science, 2011, 333:1300-1303; Eun-Ang Raiber, et al., Genome Biology, 2012, 13:R69; Chunxiao Song, et al., Cell, 2013, 153:678-691.), and this reaction is used to detect the position of 5fC in genome. The method for labeling 5fC with fluorescence group is developed using the reaction between formyl and amino (Jianlin Hu, et al., Chemistry-A European Journal, 2013, 19:2013-5840.). The formyl group is reduced to hydroxymethyl with NaBH4, so that 5fC is reduced to 5hmC, and the 5fC site is read as C in bisulfite sequencing process. Therefore, the position of 5fC base can also be identified in certain region (Chunxiao Song, et al., Cell, 2013, 153:678-691; Michael J. Booth, et al., Nature Chemistry, 2014, 6:2014-440.). These methods are early detection methods of 5fC, and promote the study of 5fC base. However, these methods suffer from many defects such as high background noise, high cost, complex operation, difficulty in sequencing at single-base resolution, and the like. Therefore, there is a need for developing a novel 5fC labeling and detecting method with high selectivity and high efficiency, which has a positive effect on further promoting the study of epigenetic demethylation.