The task of epigenomic mapping is inherently more complex than genome sequencing since the epigenome is much more variable than the genome. While an individual only has one genome, one's epigenome varies in time and space with age, tissue type, exposure to environmental factors, and shows aberrations in diseases especially in cancer. With methylated CpG's only accounting for ˜2-6% of the genome (18), large scale shotgun sequencing efforts will require some form of purification of short CpG methylated sequences. Many current enrichment technologies fall short of the dynamic range necessary to capture minute changes in CpG methylation that can have large repercussions in gene expression.
In the mammalian genome, 60-80% of relatively infrequent (1 per 100 bp on average) CpG dinucleotides are methylated at the carbon 5 position (1). In contrast, dense clusters of unmethylated CpG sequences (˜1 per 10 bp) are found at the transcription start sites of genes (2). In certain circumstances, these CpG islands are heavily methylated with the concomitant silencing of the promoter and the silencing of gene activity (3). These modifications are considered to be important for development (4), genomic imprinting (5), and X chromosome inactivation through gene silencing (6, 7). Aberrant DNA methylation of CpG islands has been frequently observed in cancer cells (8).
Many techniques exist for the enrichment of heavily methylated CpG islands from genomic DNA. One protocol relies on methylation-sensitive restriction endonucleases such as HpaII (CCGG) and HhaI (GCGC) followed by PCR identification, Southern Blot analysis or microarray profiling (9). Another approach utilizes the ability of an immobilized methyl-CpG-binding domain (MBD) of the MeCP2 protein to selectively bind to methylated double-stranded DNA sequences. Restriction endonuclease-digested genomic DNA is loaded onto the affinity column and methylated-CpG island-enriched fractions are eluted by a linear gradient of sodium chloride. PCR, microarray, DNA sequencing and Southern hybridization techniques are used to detect specific sequences in these fractions (10). These techniques are limited due to the specific cleavage moiety of the restriction enzyme and therefore will not completely reflect all combinations of bases flanking the methylated CpG dinucleotide.
There are several additional methods for analysis of methylation patterns. In the bisulfite method, single-stranded DNA (ssDNA) is exposed to a deamination reagent (bisulfite) that converts unmethylated cytosines to uracils while methylated cytosines remain relatively intact (11). After cleanup, the resultant treated DNA of interest must be PCR amplified (converting the uracils to thymines) and analyzed by a myriad of techniques that can distinguish between methylated and unmethylated DNA. If the PCR products are cloned and sequenced, alignment analysis of the untreated and treated nucleotide sequences can reveal the in vivo methylation status of the amplified region. The PCR products can also be analyzed by combined bisulfite-restriction analysis (COBRA assay) and methylation-specific PCR (MSP) (12, 13).
Recently, direct shotgun ultra-high-throughput sequencing of bisulfite-converted DNA using the Illumina 1G Genome Analyzer and Solexa sequencing technology have yielded insights of the methylation state of the small (˜120 Mbp) genome of the mustard plant Arabidopsis (14). This new technology allowed the exact identification and quantification of 5-methylcytosines at the single-nucleotide level in genes. Although highly specific and reasonably sensitive, it required at least 20-fold coverage to theoretically cover all potential methylated cytosines. Currently, no method exists to enrich bisulfite-converted CpG methylated DNA, which by the nature of the deamination reaction, is single-stranded, from total genomic DNA.