DNA methylation, the only covalent modification of DNA, involves the addition of a methyl group to the 5 position of the cytosine pyrimidine ring or the number 6 nitrogen of the adenine purine ring. Methylated DNA has been found in bacteria, fungi, plant and mammalian genomes. In vertebrates, including mammals, DNA methylation primarily occurs on the cytosine in CpG dinucleotide. Approximately 60-90% of CpG dinucleotides are methylated in most mammalian cell types. The CpG dinucleotides are not uniformly distributed in mammalian genomes. Short regions of DNA with high frequency of 5′-CG-3′ (CpG) dinucleotides are called CpG islands. For example, sequence analysis of the human genome has estimated nearly 30,000 CpG islands, which accounts for about 0.7% of the genome. CpG dinucleotides in the remaining 99.3% of the genome are sparsely distributed. Because of the high cytosine-guanine frequency of CpG islands, it is possible to identify them without knowledge of the methylation pattern of the DNA.
CpG islands often harbor the promoters of genes and play a pivotal role in the control of gene expression. In normal tissue, CpG islands are usually unmethylated but a subset of islands becomes methylated during oncogenesis, cellular development, and various disease states. Accordingly, there is great interest in determining the methylation status or profiles of promoters and CpG islands (CGIs) in various tissues, especially with regard to methylation differences accounting for altered patterns of expression in normal development and in various disease states which would greatly improve our understanding of these processes and provide potential diagnostic markers and therapeutic targets for diseases (Berman et al., Nat. Biotech., 27:341-342, 2009).
Bisulfite sequencing remains the “gold standard” for generating methylation data at single-base resolution. One way to obtain such methylation data for the CGIs is to sequence entire epigenome directly. Due to the difficulty in mapping bisulfite converted sequence reads and the methylation heterogeneity in a cell population, approximately 100 gigabases (Gb) sequence data would be needed to generate a high-resolution human DNA methylation map (Lister et al., Nature, 462:315-322, 2009). Other methylation profiling approaches include array capture (Hodges et al., Genome Res., 19:1593-1605, 2009), padlock probe capture (Deng et al., Nat. Biotech., 27:353-360, 2009; Ball et al., Nat. Biotech., 27:361-368, 2009) and reduced representation bisulfite sequencing (Gu et al., Nat. Methods, 7:133-136, 2010), which have been employed to target over 300, 2000 and 15,000 CGIs, respectively.
There is a need for simple and efficient means for selective enrichment of CGI- and other epigenetically informative CG-rich polynucleotides. When applied to epigenetic studies of methylation, the present invention has several advantages. Compared to the above-described methylation profiling approaches, the present invention provides a fast, cost-effective, PCR-free means for generating epigenome maps at single nucleotide resolution. Instead of enzyme digestion, the use of sonicated DNA improves evaluation of CpG dinucleotides in the epigenome that might otherwise go undetected. Further, since the enrichment is performed before bisulfite conversion, no bias against methylation status is introduced. Finally, the present method is designed to enable consistent yields and broad coverage of CGIs and individual CpG sites.