Field of the Invention
The present invention relates to detection of methylated CpG islands in a genome using high-throughput sequencing.
Description of Related Art
DNA methylation is the modification of the cytosine (C) to the 5′-methylated-cytosine (5mC) by adding a methyl group to the C5 position of the cytosine. It plays important roles in many biological processes, including regulation of transcription, repression of transposon, genomic imprinting and X-chromosome inactivation, and is a hot topic in the field of molecular biology. In vertebrates including human, DNA methylation mainly occurs at the CpG site (CpG indicates dinucleotide of which the guanine (G) base immediately follows the cytosine base along the DNA strand). The average content of CpG dinucleotides in the vertebrate genome is lower than their expected frequency. However, in some region of the genome, CpG dinucleotides show the expected or even higher frequency and these regions are referred to as CpG islands. CpG islands are mainly found in gene promoters. In the human genome, there are about 30,000 CpG islands, with more than 50% of CpG islands being located in promotes and more than 60% of promoters contain a CpG island. Methylation of the promoter CpG islands leads to silencing of gene expression; this mechanism participates in many biological processes, including X-chromosome inactivation, genomic imprinting, differentiation of embryonic stem cells, development of germ cells, as well as initiation and progressing of cancers. Intragenic and intergenic CpG islands may be unidentified promoters. Comprehensive understanding the biological functions of CpG islands methylation requires systematic and high-efficient techniques.
Traditional techniques for detecting DNA methylation, including restriction enzyme digestion, restriction enzyme digestion and PCR, methylation-specific PCR, can only detect single or a small number of sites. With the progresses in the high-throughput sequencing technologies in recent years, researchers begin to systematically profile DNA methylation at the level of whole genome. The current techniques for detecting DNA methylation using high-throughput sequencing including: 1) methylated DNA immuoprecipitation; 2) methylated CpG immuoprecipitation, and 3) bisulfite sequencing. The former two techniques capture methylated DNA by using an antibody or a recombinant methylated CpG binding protein, followed by high-throughput sequencing; they can only measure the DNA methylation status in a semi-quantitative manner and has a resolution of about 100 base-pair (bp). The bisulfite sequencing is based on the fact that sodium bisulfite treatment converts unmethylated cytosines (C) to uracils (U) whereas methylated cytosines (5mC) are not affected; DNA methylation status thus can be acquired by subsequent high-throughput sequencing. This technique, which is the gold standard of DNA methylation analysis, has a resolution as accurate as a single base pair. In 2009, scientists reported the first human whole-genome DNA methylation map at single-base-pair resolution by using the bisulfite sequencing. However, since this technique sequences the whole genome, the cost is very high, which hampers its application to a large number of samples. The reduced representation bisulfite sequencing technique has been developed (Gu H, et al. Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling. Nat Protoc. 2011 6(4):468-81.). This method enriches genomic regions of promoters and the CpG islands by means of MspI enzyme digestion and gel purification, followed by end-repair, A tailing, adapter ligation, size selection and PCR amplification to obtain the library. Though it is more cost-effective than the whole-genome bisulfite sequencing, the process of library establishment is complex and requires five to six days. In addition, the enrichment process is not able to distinguish between the methylated and unmethylated CpG islands, which increase sequencing cost. Another method (patent: methods and application for establishing library of high-throughput sequencing, CN103103624A) captures genomic regions including the CpG islands by using specific probes and then performs bisulfite sequencing. However, the process for capturing and library establishment is rather time-consuming.
Therefore, techniques for high-efficient detection of the methylated CpG islands remain limited.