The 5-hydroxymethylcytosine (5hmC) modification in mammalian DNA was discovered over 30 years ago1. At that time the 5hmC modification was suggested to be a rare and non-mutagenic DNA damage lesion2 and therefore it was given little attention. In early 2009 5hmC was identified again; however, in this year the importance of 5hmC in epigentics was realized as two independent groups began the initial characterization of the 5hmC modification. One group identified an enzyme capable of catalyzing the formation of 5hmC from 5-methylcytosine—Tet13. The other group demonstrated that 5hmC was a stable modification present in specialized Purkinje neurons4. Further research has shown that Tet1, Tet2, and Tet3 are capable of catalyzing the oxidation of 5meC creating 5hmC5-7.
The molecular function of 5hmC remains poorly understood; however, it has been shown that 5hmC is involved in a variety of DNA transactions: it has been shown to be an intermediate in DNA demethylation3, 8, to have a dual function in transcription9-11 and in the case of aberrant 5hmC patterns to be involved in tumorigenesis7. While the function of the 5hmC modification remains unclear, it has become clear that identifying genomic regions that contain 5hmC will help to elucidate the function of this base. This need to identify genomic regions containing 5hmC has led to the development of suitable methods. Currently, there are several methods available to identify 5hmC; each method has certain limitations that are discussed below. The method described here allows for base specific resolution of (i) 5hmC and (ii) 5meC in DNA.
Currently, there are several methods that allow for the identification of 5hmC. These methods include antibodies raised against 5hmC9, 21, 22, antibodies raised against cytosine 5-methylenesulfonane (CMS) the product of bisulfate treatment of 5hmC7, 23, single molecule real time sequencing relying on DNA polymerase kinetics24, restriction enzymes that are resistant or sensitive to 5hmC or β-glu-5hmC25-27 and three methods that take advantage of the β-glucosyltransferase: (i) incorporating a chemical tag into the substrate for the β-gt28, (ii) the glucosylation, periodate oxidation, and biotinylation (GLIB) method23, and (iii) the JBP1 pull-down assay targeting glu-5hmC12 
The use of antibodies appears to be a reasonable choice to identify DNA modifications; however, we and others5 have seen that some of the currently available antibodies directed against 5hmC appear to be unable to sufficiently enrich for DNA that contains 5hmC; indeed one report demonstrates that one particular antisera raised against 5hmC is unable to differentiate 5hmC from 5meC5. It has been reported that antisera developed against 5hmC tends to prefer genomic regions dense in 5hmC content22. Moreover, the use of polyclonal antisera directed against 5hmC will provide an inherent problem, as there will be animal-to-animal variation in antigenic specificity to 5hmC that may affect the long-term usefulness of such antisera.
Upon treatment with sodium bisulfite 5hmC is converted to CMS, which after sequencing appears identical to bisulfite converted 5meC; therefore it has been shown that the use of bisulfite sequencing cannot distinguish between 5meC and 5hmC30. Interestingly, one group has raised an antiserum directed against CMS7, 23.
Single Molecule, Real Time (SMRT) sequencing takes advantage of the original Sanger sequencing technique; however, this method is able to distinguish between cytosine, 5meC, and 5hmC using the kinetic signature or speed that the polymerase passes over each base24. This method, aside from being prohibitively expensive, requires a significant amount of DNA that is already enriched for 5hmC prior to use, which makes it dependent on a 5hmC enrichment assay. Because this method uses high-throughput sequencing it is cumbersome for the analysis of single or a few loci.
Several research groups and companies have identified restriction enzymes that are sensitive or resistant to 5hmC or β-glu-5hmC25-27. The principle behind these systems is that upon treatment with the restriction enzymes unmodified DNA is cleaved, resulting in reduced signal in a qPCR reaction. This reduction in signal is then compared to an undigested sample and the difference in qPCR signals is proportional to the amount of 5hmC present in the initial sample. These methods work quite well for genomic regions that contain significant amounts of 5hmC; however, because the restriction sites recognized by these enzymes are 4-6 bp in length these restriction endonuclease based methods can, at best, only recognize 1/16 of all 5hmC modifications.
Three groups have developed methods that take advantage of the specificity that the β-gt has for 5hmC. The first group28 incorporated an azide group into the substrate for the β-gt—UDP-glucose—creating UDP-6-N3-Glucose. After the azide modified glucose was incorporated into 5hmC containing DNA by the β-gt, a second group could be added to the 6-N3-glu-5hmC using “click” chemistry. This second chemical group could contain a biotin for pull down, a fluorescent probe for quantification, and theoretically any group that could be coupled to the modified glucose using “click” chemistry. The primary drawback to this method is that UDP-6-N3-glucose is not commercially produced and requires significant expertise in organic chemistry to synthesize. Additionally, this targeting strategy of 5hmC has been combined with a primer extension assay and shown to allow for base specific resolution as a chemical group can be linked to 6-N3-glu-5hmC that blocks a DNA polymerase. By blocking the polymerase the terminal base can be assumed to have originally contained a 5hmC modification. The use of this method for base specific resolution has substantial problems as every end that terminates in a C must be assumed to be a 5hmC. While this effect can potentially be averaged with several high throughput sequencing reads assuming highly optimized enzyme to DNA ratios, it remains problematical for single gene analysis.
A second approach using the β-gt to identify genomic regions uses the glucosylation, periodate oxidation, biotinylation (GLIB) method23. In this method after the transfer of glucose to 5hmC, the resulting β-glu-5hmC is oxidized using NaIO4 which creates reactive aldehydes on the glucose moiety attached to 5hmC. These oxidized glucose molecules can then be reacted with commercially available aldehyde reactive probes containing a biotin modification. This biotinylation allows for the efficient pull down of 5hmC containing DNA.
Finally, the third approach utilizing the β-gt for the identification of 5hmC involves the specific recognition of this modified base by a second protein—J-base binding protein or JBP1. Because the only difference between β-glucosyl-5hmC and the J-base is an amino group, it was reasoned that JBP1 may be able to specifically interact with β-glu-5hmC. JBP1 was indeed able to specifically interact with β-glu-5hmC12. Therefore, when JBP1 was covalently linked to epoxy modified magnetic beads it allowed for the pull down of the β-glu-5hmC containing DNA. After removing protein from the pulled down DNA it was demonstrated by gene specific qPCR that it was possible to enrich for DNA containing 5hmC12. Mechanistically, this method provides two degrees of specificity for the identification of 5hmC in genomic DNA: first, the β-gt can only modify cytosines in DNA that are hydroxymethylated and second, JBP1 interacts specifically with β-glu-5hmC. Like all DNA pull down methods the very optimal resolution of this method can identify a 5hmC base within about 50-100 base pairs; this limitation is due to the inability to reliably identify DNA fragments of a shorter length using currently available molecular biology methods. Another consideration when using this protocol is that this method may over-represent DNA regions that contain high levels of 5hmC. This potential over-representation could possibly occur because in 5hmC dense regions more JBP1 can interact with the DNA and pull down these regions more efficiently.
Improved methods for detecting 5-hydroxymethylcytosine residues in DNA are needed. In particular, methods that can discriminate between 5meC and 5hmC are needed, as well as methods which can identify 5meC and 5hmC at single base resolution.