Restriction—modification (RM) systems are widespread among prokaryotic organisms (Roberts & Halford, 1993; Raleigh & Brooks, 1998). They are composed of two enzymatic activities. One of them, DNA methylation activity, ensures modification of A or C base within the specific DNA sequence. This site-specific modification protects the host DNA from the action of the other, endonucleolytic activity of the same specificity (Wilson, 1991). The biological function of complete RM systems is generally thought to be the protection of the host genome against foreign DNA, in particular bacteriophage DNA. However, at least two other hypotheses of the biological function of RM systems were proposed in the last few years. According to the hypothesis of Arber, RM enzymes are regarded as modulators of the frequency of genetic variation (Arber, 2000). An alternative hypothesis considers RM genes to be selfish mobile genetic elements, like viruses or transposons that invade genomes without necessarily providing selective advantages (Kobayashi, 2001; Naito et al, 1995). In addition, some prokaryotic DNA methyltransferases (MTases) and restriction endonucleases may execute other functions. For instance, modification of specific DNA sequences may regulate chromosomal DNA replication (Messer & Noyer-Weidner, 1988) and expression of genes (Barras & Marinus, 1989; Christensen & Josephsen, 2004; Beletskaya et al, 2000; Reisenauer & Shapiro, 2002; Srikhanta et al, 2005; Roberts et al, 1985), or may be involved in DNA mismatch repair (Modrich, 1989).
The latest classification attributes all known restriction endonucleases to four types (Roberts et al, 2003). Of these, Type II enzymes are the most important due to their unique property to recognize short specific DNA targets and cleave DNA at a fixed position either within DNA target or very close to it. This property made them indispensable in recombinant DNA technologies. Type II enzymes are very heterogeneous and are further classified into several subdivisions. One of them, Type IIM, encompasses enzymes that recognize specific methylated sequences in DNA and cleave at a fixed site. There are several enzymes which belong to this group (DpnI, GlaI, GluI, BisI, BlsI, PcsI). Of these, DpnI and its isoschizomers (i.e. restriction enzymes which recognize the same DNA target and cleave at the same position) recognize DNA targets containing the modified adenine (5′-Gm6ATC-3′), while all other listed enzymes recognize DNA targets which contain 5-methylcytosine. The key characteristics of known Type IIM enzymes are that they recognize symmetric DNA targets containing modified bases on both DNA strands, and cleave both DNA strands within the target.
Type IV restriction enzymes recognize and cleave modified DNA as well. However, in contrast to Type IIM enzymes, the Type IV representatives cleave DNA at an undefined position. In addition, the exact recognition target has been determined for only one of them, McrBC from Escherichia coli K-12. McrBC recognizes two RmC dinucleotides (R stands for purine, mC—for methylated cytosine, either m4C or m5C) which are separated by anywhere from 40 to 3000 base pairs. Cleavage occurs in between these two sites, but closer to one of them, approximately 30 base pairs from the methylated base (Raleigh & Wilson, 1986; Stewart & Raleigh, 1998).
The ability of methyl-dependent enzymes to differentiate between modified and non-modified DNA molecules or their regions has found many practical applications. Of note, applications differ significantly depending on both the type of restriction enzyme and the type of modified base which is recognized by particular restriction enzyme.
DpnI cleaves DNA targets which comprise a 4 nt recognition sequence containing m6A such as those modified by Escherichia coli enzyme Dam methyltransferase (Geier & Modrich, 1979). The Dam targets of plasmids isolated from E. coli dam+ strains become modified and thus susceptible to DpnI cleavage. Based on this feature a simple and efficient site-directed mutagenesis method was developed, in which a pair of mutagenic primers is annealed to opposite strands of Dam-methylated plasmid DNA to be mutagenised, several rounds of linear amplification are carried out, and then parental DNA molecules are selectively cleaved by DpnI at modified Dam sites, leaving newly synthesized circular non-methylated double-stranded DNA molecules intact. The closed double-stranded DNA corresponding to the parental template molecules, but containing the desired mutation or mutations of interest, may be recovered from the transformed cells (U.S. Pat. No. 5,789,166). Later on DpnI was employed in a plethora of similar site-directed mutagenesis approaches, in all cases serving for the cleavage of parental molecules before transformation (US Patent Application 20060228786; Edelheit et al., 2009; Liu & Naismith, 2008; Li et al., 2008; Wei et al., 2004; Bichet et al., 2004; Li & Wilkinson, 1997). In addition, the ability of DpnI to cleave methylated DNA molecules was used to select for recombinant molecules (Shareef et al., 2008) and for investigation of Dam methylation kinetics (Wood et al., 2007; Li et al., 2007).
For efficient enrichment by mutagenised double-stranded DNA molecules after site-directed mutagenesis methylation-specific restriction endonucleases like DpnI need to cleave both the fully methylated parental double-stranded DNA molecules and the hemi-methylated DNA molecules, which are newly synthesized strands combined with parental strands. If not cleaved, hemi-methylated DNA molecules may be repaired back to the initial genotype after transformation, resulting in reduced efficiency of mutagenesis. However, literature reports relating to the ability of DpnI to cleave hemi-methylated GATC targets are contradictory. For instance, some authors claim that DpnI does not cleave hemi-methylated targets (Vovis & Lacks, 1977); others observed that site-specific cleavage of hemi-methylated substrates is very slow (Wood et al., 2007; http://www.neb.com) and depends on the concentration of sodium chloride, where an increase in salt concentration results in increased specificity of DpnI for the doubly-methylated substrate (Wobbe et al., 1985; Sanchez et al., 1992). DpnI therefore has its limitations: hemi-methylated DNA substrates are cleaved very slowly by DpnI, high enzyme and low salt concentrations are required to induce cleavage of such substrates. Most importantly, there remains a level of uncertainty regarding the performance of DpnI on hemi-methylated DNA substrates because it is impossible to distinguish between cleavage of fully methylated and hemi-methylated DNA substrates in reaction mixtures where both types of DNA molecules are present. Thus, a need exists for restriction enzymes which recognize hemi-methylated double-stranded DNA targets and cleave them efficiently at a fixed position, yielding reaction products which can be easily visualized by gel electrophoresis and staining.
Epigenetics is an application for which both Type II M and IV enzymes are known, where m5C-specificity is most important. Type IIM representatives (GlaI, GluI, BisI, BlsI, PcsI) cleave both DNA strands within their recognition site, which is from 4 to 6 nucleotides in length with at least one 5-methylcytosine in each DNA strand (Russian patent application RU 2270859; http://www.sibenzyme.com/products/m2_type). In contrast, the best-characterized Type IV restriction endonuclease McrBC recognizes two remote RmC dinucleotides and cleaves both DNA strands between these two sites, but closer to one of them, approximately 30 base pairs from the methylated base.
The enzymatic conversion of cytosine to 5-methylcytosine is one of most important epigenetic changes in vertebrate and plant genomes (Bird, 1992; Finnegan, 1996). It occurs mainly within the dinucleotide CG, and this epigenetic change plays important roles in transcriptional gene silencing, development, aging, cancer and other diseases (reviewed in: Jörg Tost, 2009, pp. 3-23). There are various methods available for studying DNA methylation. Some of them provide information about the degree of global genomic DNA methylation (reviewed in: Jörg Tost, 2009, pp. 23-45), the others are directed towards analysis of the DNA methylation status of specific sequences and the discovery of new methylation hot spots. In general, there are three major approaches which are used to distinguish between modified and non-modified DNA regions (however, there are many techniques which combine two out of three approaches listed below).
The first approach takes advantage of a chemical reaction using sodium bisulfite, which selectively deaminates cytosine to uracil, while m5C is resistant to this conversion (Clark et al., 1994). This chemical reaction results in primary sequence change in the DNA. The modified DNA strands could be amplified by use of polymerase chain reaction and analyzed using different techniques (reviewed in: Jörg Tost, 2009). Of these, genome-wide deep sequencing provides the most comprehensive information, revealing not only modified cytosines and their contexts, but also the level of methylation of particular cytosine within the genome in population of analyzed cells. Very recently shotgun bisulfite sequencing of the Arabidopsis genome revealed that only 55% of modified cytosines are located within the dinucleotide CG, while 23% are found within CHG (H stands for A, C or T) and 22%—within CHH (Lister et al., 2008), and it might be that eukaryotic DNA methyltransferases possess sequence preferences beyond the CG, CHG and CHH contexts (Cokes et al., 2008). Surprisingly, nearly one-quarter of all modified cytosines identified in human embryonic stem cells IMR90 were in the context of CHG or CHH as well, but non-CG methylation disappeared after induction of differentiation (Lister et al., 2009). The bisulfite-based approach is the “gold standard” of epigenetic studies. However, after sodium bisulfite conversion of cytosines the genome consists of only three DNA bases (U or T, A, G), therefore bioinformatics challenges will need to be overcome in order to predict the genomic location of obtained DNA sequences precisely. Furthermore, bisulfite sequencing remains time consuming and costly, especially when the methylation state of a large number of loci has to be investigated. Finally, the most critical step of bisulfite approach is the completeness of sodium bisulfite-catalyzed conversion of cytosines. However, sodium bisulfite treatment causes significant sample loss due to DNA degradation (Grunau et al., 2001). Therefore, a choice of a right balance between completeness of the modification and an acceptable loss of DNA sample is necessary. As a result, some fraction of cytosines remains unaltered, resulting in false-positive signals.
The second approach involves the use of m5C-binding proteins, allowing selective isolation of modified DNA regions. Comparison of methylation levels of individual DNA regions can be carried our using several different approaches (reviewed in: Jörg Tost, 2009). However, this type of analysis suffers from low resolution and an inability to identify the precise sequence context of methylation site(s).
The third approach is based on the use of either methylation-sensitive restriction enzymes like HpaII or NotI (recognition targets CCGG and GCGGCCGC, respectively), or methylation-specific (methylation-dependent) restriction enzymes like Type IV enzyme McrBC or any of Type IIM representatives GlaI, GluI, BisI, BlsI, PcsI. Methylation-sensitive enzymes do not cleave DNA if their recognition targets contain m5C within the CG dinucleotide. In contrast, methylation-specific enzymes will cleave modified DNA targets, leaving non modified ones intact. Detection of individual DNA fragments and evaluation of their methylation levels at particular CG targets (which are recognized and cleaved either by methylation-sensitive restriction enzyme or by methylation-specific Type IIM restriction enzyme) can be carried out directly by using Southern hybridization. Also, there are several approaches which involve amplification of DNA (pre-cleaved either with methylation-sensitive enzyme, or with methylation-specific enzyme, or with both) followed by detection of amplified fragments by means of different approaches (US Patent Application 20060275806; US Patent Application 20090004646; US Patent Application 20050272065; US Patent Application 20050158739; US Patent Application 20050153316; methods reviewed in: Jörg Tost, 2009).
Unfortunately, only a tiny fraction of methylated cytosines can be targeted using these assays. For example, only 3.9% of all nonrepeat CGs in the human genome reside within recognition sites of the HpaII enzyme (Fazzari & Greally, 2004). Furthermore, HpaII and other methylation-sensitive enzymes are not suitable for analysis of methylated bases within contexts other then CG (for instance, CHG or CHH). The same is true for methylation-specific Type IIM enzymes GlaI, GluI, BisI, BlsI and PcsI which recognize symmetric targets of 4-6 nucleotides in length. In contrast, Type IV enzyme McrBC, which DNA recognition target is RmC, recognizes ˜50% of all CG, CHH and CHG targets containing m5C. However, McrBC recognizes two remote RmC dinucleotides and cleaves both DNA strands between these two sites at a non specified position. Therefore, the cleavage position does not provide information which could be used for prediction of modified cytosine, and McrBC cannot be used for such type of analysis.
In summary, it may be concluded that all major approaches which are used today for investigation of DNA methylation status suffer from various drawbacks. In case of methylation-dependent restriction enzymes the major drawback of m5C-specific Type IIM enzymes is their relatively long specific recognition sequence (4-6 nt in length) and a need for presence of two or more modified cytosines within the target, limiting their usage down to the small fraction of m5C-containing regions. The Type IV enzyme McrBC has a potential to recognize up to 50% of all modified cytosines, but it suffers from cleavage at a non-specified position, making it impossible to identify modified cytosines from analysis of cleavage reaction products. Thus, a need exists for methylation-dependent restriction enzymes which do not suffer from these drawbacks.