Relationship Between DNA Methylation and Gene Modulation or Diseases
In higher eukaryote genome, DNA methylation allows changes in spatial structure of modified DNA to lead to gene silencing or overexpression without altering type and number of DNA bases, whereby various phenotypes in organisms are rendered.
For instance, methylation usually occurs at the CpG sites in normal cells, while methylation does not occur in CpG islands in promoters. The whole level of DNA methylation in tumor cells is significantly reduced, and significant demethylation occurs in regions with low gene abundance. This low level of DNA methylation results in chromosomal instability and carcinogenesis. For example, testicle-specific genes, melanoma-associated genes, and proliferation related genes are silenced in somatic cells, and the CpG islands of promoters thereof are methylated; while demethylation occurs in promoters of corresponding cancer cells, such that these genes can be expressed. In addition, reduced level of methylation promotes expression of some genes (e.g., transcription factors related to proliferation). During development of tumors, the reduction in level of DNA methylation will result in further worsening of damage, inducing transformation from benign proliferation to malignant proliferation.
DNA methylation plays a very important role in gene expression pattern and genome stability. In view that most of the researchers in the world have recognized that DNA methylation plays an important role in onset and development of human disease and DNA methylation has become one of the research focuses at present, and DNA methylation modification acts on whole genome, techniques for detecting DNA methylation have influences on the studies and understandings of methylation, thereby affecting the studies on human disease, in particular the cancers to a great extent.
Current Sequencing Methods for DNA Methylation
Now, according to different methods of preparing the sequencing libraries, the existing sequencing techniques for detecting DNA methylation may be divided into, shotgun bisulfite sequencing, MeDIP sequencing, MBD sequencing, enzymatic digestion-bisulfite sequencing and so on.
Shotgun Bisulfite Sequencing Method
Shotgun bisulfite sequencing method comprises mainly the following steps: DNA fragmentation, end repairing of DNA fragments, methylated sequencing adapter addition, bisulfite treatment, PCR amplification, sequencing and sequencing data alignment. In details, the fragmented DNA, after subjecting to end modification and addition of “A” base to 3′-end, is directly linked to a methylated sequencing adapter (all sites in the adapter are modified as methylation status); unmethylated cytosines in single-stranded DNA are deaminated by bisulfite under appropriate reaction conditions to give uracils while leaving methylated cytosines unchanged, i.e., bisulfite treatment occurs. Then PCR amplification is carried out to convert all the uracils to the thymines. Finally, the PCR products are sequenced and are compared with untreated sequences to determine whether methylation occurs at CpG sites.
This sequencing method has been applied in sequencing Arabidopsis methylation and human cell line, and billions of sequencing data are obtained with sequencing depth of 20× and 14×, respectively, that is, the average sequencing depth for the whole genome is up to 20 times and 14 times, respectively.
Although this sequencing method has solved the issue associated with high-throughput scanning of DNA methylation patterns on a whole-genome level, this method produces huge number of nucleotide sequences, resulting in the following new problems. The first problem is analysis of huge number of data, in particular, analysis of sequencing data of a large genome of higher mammals (there are about 60 billions base pairs for 20× coverage). After sequencing, it needs great and complex works to perform splicing and alignment of huge number of data. The second problem is the sequencing cost. Even if the newest 3G sequencing chip is used in this method, the sequencing cost is still very expensive. Thus this method cannot be served as a conventional experimental technique adapted to most molecular biological laboratories.
MeDIP Sequencing and MBD Sequencing
Since methylation in mammal generally occurs in the 5′ carbon atom of cytosine of CpG, it is possible to enrich highly methylated DNA fragments by protein (MBD) or 5′-methylcytosine antibody (MeDIP) that specifically binds to methylated DNA. The enriched DNA fragments are sequenced by high-throughput sequencing. Specifically, a method for isolating methylated DNA fragments by MBD method is called as methylation CpG immunoprecipitation (MCIp). MeDIP consists in that 5-methylcytosine antibody can be used for immuneprecipitating enriched methylated DNA fragments with high specificity, and 5-methylcytosine antibody can also bind to single methylated cytosine at non-CpG site. Therefore, it has higher specificity than MBD. This technique is called as methylated DNA immunoprecipitation, which can be used for high throughput screening of abnormal methylated genes in combination with new generation sequencing technique. The method avoids the limitations of enzyme cutting site when restriction enzymes are used.
When MeDIP sequencing or MBD sequencing is carried out, a sequencing library needs to be prepared. Genomic DNA is fragmented and then linked to sequencing adapters that are not chemically modified. Then DNA fragments containing methylated cytosine are separated from unmethylated DNA fragments with MBD or 5-methylcytosine antibody. The methylated DNA fragments are purified and directly subjected to PCR and sequencing without bisulfite treatment.
For instance, HCT116 colon cancer cell line DNA was sequenced via MBD method by David Serre, and the results showed that about 19 millions (occupying two channels of chip) of sequencing data can detect all known methylated or some unknown methylated regions, largely lowering sequencing cost. However, since there is no bisulfite treatment before sequencing, it is necessary to identify methylated CpG sites to distinguish them, which markedly increases subsequent work.
Enzymatic Digestion-Bisulfite Sequencing
Enzymatic digestion-bisulfite sequencing aims at enriching DNA fragments to be detected, reducing size of sequencing library and lowering sequencing cost. The technique is able to successfully enrich some CpG islands (different CpG islands are obtained by alignment of 8% of the data). The technique reduces the size of sequencing library to some extent. It is not necessary to carry out subsequent identification of methylation sites after bisulfite treatment.
For instance, DNA fragments in CpG rich regions were enriched with 4 endonucleases when bisulfite sequencing based on enzymatic digestion was employed by Michael Zeschnigk [Smiraglia D J, Plass C. The study of aberrant methylation in cancer via restriction landmark genomic scanning. Oncogene 2002; 21: 5414-5426]. The principle of the method is that the fragmentation of DNA is not achieved by ultrasound but achieved by combined enzymatic digestion by multiple endonucleases (MseI, Tsp 509I, NlaIII and Hpy CH4V), wherein the restriction enzyme cutting sites of MseI, Tsp509I, NlaIII and Hpy CH4V are TTAA, AATT, CATG and TGCA, respectively. According to computer prediction made by the authors, the combined enzymatic digestion by these four enzymes is superior to the combined enzymatic digestion by other enzymes in terms of DNA fragment sizes and number of CpG islands that can be cleaved, etc. After enzymatic digestion, fragments of 300 bp-800 bp were purified, linked to methylated sequencing adapters, subjected to bisulfite treatment and PCR, and sequenced.
However, since the genomic DNA is digested by restriction enzymes in this technique, and the restriction enzyme cutting sites are fixed, the distribution of fragment size differs greatly. DNAs of less than 300 bp or greater than 800 bp are given up so that a part of genomic DNA cannot be sequenced. In addition, since the reading length of sequencer is only 130 bp, fragments of 300 bp-800 bp cannot be sequenced through. Hence, a part of methylated DNA fragments having biological sense and function cannot be detected by this technique.
No matter which of the above techniques is used, there is a noteworthy problem. That is, sequencing by all the above techniques will produce a huge number of sequencing data with no biological function. This is because, in the existing methods of library construction and sequencing, data of heterochromatin regions consisting of a huge number of repetitive sequences accounts for a high proportion of the sequencing data. This is due to the fact that genes to be detected comprise repetitive sequences of highly methylated CpGs (for example, centromeres and telomeres comprise repetitive sequences, in particular, highly repetitive sequences, and these repetitive sequences are believed to involve in structure and composition of chromosomes, while it has not been found that they directly take part in expression and regulation of genes). However, the analysis of relationship between the DNA methylation of repetitive sequences, especially highly repetitive sequence and the expression of target genes is weaker [Herman J G et al., Methylation—specific PCR: a novel PCR assay for methylation status of CpG islands. Proc Natl Acad Sci USA 1996; 93: 9821-9826]. Therefore, by removing the repetitive sequences and only sequencing the methylated DNA fragments in the functional regions will make the cost significantly reduce.
Techniques for Removing Repetitive Sequences
The researchers have studied how to remove repetitive sequences at present. For example, bisulfite sequencing technique is combined with high density chips, wherein the scope of target methylated DNA is selected by chip hybridization. It is known that chip probes designed by Agilent and NimbleGen are centered in promoters and first exon of genome which can remove, such as, redundancy from heterochromatin DNA fragments. In addition, detection of single base polymorphisms based on high throughput sequencing technique also utilizes exon capturing chips to capture exon DNA fragments to reduce size of sequencing library, thereby lowering sequencing cost for each sample. However, the quantity of DNA captured by exon capturing chips is limited, affecting the subsequent experiments, and its ability to remove redundant methylated DNA is not enough to satisfy sequencing analysis of methylated DNA at genome level.
As is known, C0T-1 DNAs are applied as blocking sequences for repetitive sequences in hybridization tests such as fluorescence in-situ hybridization and comparative genomic hybridization etc. The inventors believe that it may be used as an important tool to remove repetitive sequences. It is known that C0T-1 DNA is rich in highly and moderately repetitive sequences, and is produced based on the principle that denatured highly and moderately repetitive sequences can be renatured while single or low copy DNA sequences are difficult to be renatured.
A conventional method for removing C0T-1 DNA repetitive sequences is described as follows: C0T-1 DNA is labeled with biotin. Magnetic beads are coated with avidin. The C0T-1 DNA labeled with biotin is bound to the magnetic beads coated with avidin by utilizing the principle that avidin binds to biotin to obtain a complex of C0T-1 DNA labeled with biotin and magnetic beads coated with avidin. The complexes are hybridized with target DNA fragments that might comprise repetitive sequences. Based on the principle that denatured highly and moderately repetitive sequences can be renatured while single or low copy DNA sequences are difficult to be renatured, the repetitive sequences are hybridized with the C0T-1 DNA labeled with biotin to obtain a complex containing the repetitive sequences—C0T-1 DNA labeled with biotin—magnetic beads coated with avidin. The magnetic bead complexes are separated and discarded. Meanwhile, the target DNA that has been treated by magnetic beads is recovered. The recovered DNA is a DNA fragment from which the repetitive sequences have been removed.
Removal of repetitive sequences by C0T-1 DNA is characterized in that methylated DNA fragments in the functional regions (promoters, exons, and a part of introns) will not be captured and removed, while highly and moderately methylated repetitive sequences in the heterochromatin regions are removed. This method for removing repetitive sequences can satisfy the requirements in sequencing methylated DNA at genome level.