The present invention relates to a DNA endonuclease, McrBC, obtainable from Escherichia coli, two components of which have been purified in active form. The present invention also relates to the process for detecting and cleaving methylated DNA with said endonuclease. Other related processes are disclosed, including a process for the determination of the modification state of DNA, a process for the determination of epigenetic alterations as well as a process for identifying and isolating additional enzymes that cleave modified DNA.
Restriction endonucleases are a class of enzymes that occur naturally in bacteria. When they are purified away from other contaminating bacterial components, restriction endonucleases can be used in the laboratory to break DNA molecules into precise fragments. This property enables DNA molecules to be uniquely identified and to be fractionated into their constituent genes. Restriction endonucleases have proved to be indispensable tools in modern genetic research. They are the biochemical `scissors` by means of which genetic engineering and analysis is performed.
Typically, cells expressing a restriction endonuclease also elaborate a cognate modification methylase, which recognizes the same sequence as does the endonuclease and methylates a specific base within that sequence. This modification renders the DNA resistant to cleavage by the cognate endonuclease (Bickle, I.A. Amer. Society for Molecular Biology, 692-696 (1987); Modrich & Roberts, Nucleases, 109-154 (1982)). Consequently, the DNA of most bacteria is modified at specific sequences, but the sequences modified are different for different strains and species because they carry different restriction enzymes. DNA of many eukaryotic organisms, including all mammals and higher plants, is also modified to contain 5-methylcytosine (m5C) in 5'CG3' or 5'CNG3' sequence contexts. In these cases, no restriction enzymes are found, and the role of modification is thought to be regulation of gene expression (Ehrlich & Wang, Science, 212:163-170 (1981)).
One example is known of a restriction endonuclease that recognizes a modified sequence, rather than an unmodified one; this enzyme (DpnI and isoschizomers) cleaves only DNA containing the sequence GmATC, where mA is N.sup.6 -methyladenine (Lacks & Greenberg, J. Biol. Chem. 250:4060-4066 (1975); Lacks & Greenberg, J. Mol. Biol., 114:153-168 (1977)). This enzyme has been used extensively to detect GmATG modification in DNA from enteric bacteria wherein methylation of this sequence regulates DNA repair and replication (Barbeyron et al., J. Bacteriol., 160:586-590 (1984); Russell & Zinder, Cell, 50:1071-1079 (1987); Messer & Noyer-Weidner, Cell, 54:735-737 (1988)). as well as from other sources. No other methylation pattern can be analyzed in this way since no other modification-specific restriction enzyme has been described.
Also, many other nucleases exist besides the restriction endonucleases Linn & Roberts, Nucleases, Cold Spring Harbor Laboratory Press (1982)), which possess widely varying properties. However, only a small number of different co-factor requirements have been described. Nucleases often, but not always require a divalent cation, which may be Mg.sup.++ or Ca.sup.++, or in some cases Mn.sup.++. Nucleases frequently require ATP, which in a few cases can be replaced by a non-hydrolyzable analogue of ATP, for example ATP-gamma-S or AMP-PNP or AMP-PCP. One example has been described in which GTP will satisfy the nucleotide requirement of an ATP-dependent nuclease, but the nuclease acts much less efficiently with GTP than with ATP (Goldmark & Linn, J. Biol. Chem., 247:1849-1860 (1972)). No nuclease which requires GTP rather than ATP has been, to date, isolated in an active, commercially useful form. In addition, as discussed below, in accordance with the present invention, McrBC is the only nuclease that will not use ATP since ATP acts as an inhibitor of its activity.
The McrBC restriction system, formerly known as RglB, (Raleigh et al., J. Cell. Biochem., suppl. 16B, 21 (1992)), was first discovered in 1952 (Luria & Human, J. Bacteriol., 64:557-559 (1952)) and was investigated by genetic methods in the 1960's (Georgopoulos & Revel, Virology, 44:271-285 (1971); Georgopoulos, Biochem. Biophys. Res. Commun., 28:179-184 (1967); Revel, Virology, 31:688-701 (1967); Revel et al., Biochem. Biophys. Res. Commun., 18:545-550 (1965); Revel & Hattman, Virology, 45:484-495 (1971)). It was found that the restriction of T-even phage seen in vivo depended both upon incorporation into DNA of the unusual base 5-hydroxymethylcytosine (hmC) and upon failure to further modify the hmC by glucosylation due to phage or host mutations. This system was designated Rgl reflecting the fact that it restricts glucoseless phage.
In the mid-80's, several groups reported restriction of DNA methylated by sequence-specific cytosine methylases (Noyer-Weidner et al., Mol. Gen. Genet., 205:469-475 (1986); Raleigh & Wilson, Proc. Natl. Acad. Sci. USA, 83:9070-9074 (1986)). This was designated Mcr restriction, for modified cytosine restriction. The mcrB-dependent restriction was shown to be genetically identical with the above-described rglB-dependent restriction (Raleigh et al., Genetics 122:279-296 (1989)). Demonstration of this restriction effect explained previous observations that the genes for many DNA modification methylases could not be cloned in some strains (Blumenthal et al., J. Bacteriol., 164:501-509 (1985); Kiss & Baldauf, Gene, 21:111-119 (1983)), and that DNA from diverse organisms could be cloned only with low efficiency and in biased fashion (Whittaker et al., Nucleic Acids Res., 16:6725-6736 (1988); Woodcock et al., Nucleic Acids Res., 17:3469-3478 (1989)).
The genes involved (mcrBC) were cloned (Kruger et al., Gene, (1991); Raleigh et al., Genetics, 122:279-296 (1989); Ross et al., Gene, 61:277-289 (1987); Sozhamannan & Dharmalingam, Curr. Microbiol., 17:269-273 (1988); Sozhamannan & Dharmalingam, Gene, 74:51-52 (1988)) and sequenced (Dila et al., J. Bacteriol., 172:4888-4900 (1990); Ross et al., J. Bacteriol., 171:1974-1981 (1989b)). It was shown that two genes were required for restriction (Dila & Raleigh, Gene, 74:23-24 (1988); Dila et al., J. Bacteriol., 172:4888-4900 (1990); Ross et al., Mol. Gen. Genet., 216:402-407 (1989a)) and that the two genes directed expression of three proteins (Dila et al., J. Bacteriol., 172:4888-4900 (1990); Kruger et al., Gene, in press (1991); Ross et al., Mol. Gen. Genet., 216:402-407 (1989a)). A possible GTP-binding motif was identified in the amino acid sequence of McrB (Dila et al., J. Bacteriol., 172:4888-4900 ( 1990)). However, there existed uncertainty concerning the precise position of translation initiation. (Dila et al., J. Bacteriol., 172:4888-4900 (1990); Kruger et al., Gene, in press (1991); Ross et al., Gene, 61:277-289 (1987); Ross et al., Mol. Gen. Genet., 216:402-407 (1989a)). Two start sites have been proposed for McrB.sub.L (one of two protein products produced by the mcrB gene) and McrC: Ross et al. (Ross et al., J. bacteriol. 171:1974-1981 (1989b)) chose starts based on potential Shine/Dalgarno sequences, while Dila et al (Dila, et al., J. Bacteriol. 172:4888-4900 (1990)) arbitrarily chose the first methionine of the frame. The potential translation starts are shown in FIG. 10 (SEQ ID No:1 and SEQ ID NO:2). As disclosed in more detail below, the construct of the present invention used to overproduce the McrB component elaborates only one of two products of the mcrB gene, namely McrB.sub.L. As noted above, the mcrB gene encoding McrB.sub.L expresses two protein products, McrB.sub.L and McrB.sub.S in most expression constructs. Translation of McrB.sub.L begins early in the open reading frame (Dila, et al., J. Bacteriol. 172:4888-4900 (1990); Ross, et al., Gene, 61:277-289 (1989)), while translation of McrB.sub.S is in the same reading frame but begins at an internal initiation site (Kruger, et al. Gene, in press (1991); Ross, et al., Gene, 61:277-289 (1987)). The precise N-terminus of McrB.sub.S is not known but a candidate initiation site has been proposed (Ross, et al., Gene, 61:277-289 (1987)).
McrBC-like enzymes are likely to be found in other organisms and may show a degree of conservation. Both E.coli K12 and E.coli B display biological properties expected of strains with an McrBC activity, but these properties show somewhat different specificities for different phages (Raleigh et al., Genetics, 122:279-296 (1989); Revel, Virology, 31:688-701 (1967)). This suggests that the two strains express related enzymes, which may recognize different sequences. The genes for these enzymes cross-hybridize (Daniel et al., J. Bacteriol., 170:1775-1782 (1988)). This situation resembles the case of the EcoK and EcoB enzymes; not only do these closely related genes cross-hybridize, but the enzymes can exchange subunits with retention of function and the polypeptides cross-react antigenically, even though they recognize different specific sequences (Bickle, T. A. Amer. Society for Molecular Biology, 692-696 (1987)).
It is also likely that some members of the McrBC family have diverged in DNA sequence to such a degree that cross-hybridization does not occur. This kind of divergence has occurred with other families of restriction enzymes. For example, restriction enzymes EcoK and EcoA have the same subunit structure, nucleotide dependence and cleavage properties (but different sequence recognition properties) and the chromosomal location of the hsd.sub.K and hsd.sub.A genes is the same, but the genes do not cross-hybridize and the subunits are not exchangeable (Bickle, T. B. Amer. Society for Molecular Biology, 692-696 (1987)).
Little is known of the molecular nature of mcrBC-dependent restriction. Physiological and genetic experiments investigated the fate of RglB-restricted hmC-T4 DNA inside of cells and suggested that this DNA was cleaved by RglB a small number of times (Dharmalingam & Goldberg, Nature, 260:406-410 (1976a)). The small number of cleavages suggested that cleavage might be sequence-specific, but this interpretation was contested, based on physiological considerations (Kruger & Bickle, Microbiol. Rev, 47:345-360 (1983)). The issue was complicated by the demonstration that T4 can express a protein capable of inhibiting the action of RglB in vivo (Dharmalingam & Goldberg, Nature, 260:454-455 (1976b)).
Attempts to characterize the in vitro activity of the enzyme were made in the early 1970's (Eigner & Block, J. Virology, 2:320-326 (1968); Fleishman et al., J. Biol. Chem., 1561-1570 (1976); Fleishman et al., Proc. Natl. Acad. Sci. USA, 68:2527-2531 (1971)). The latter authors demonstrated rglB.sup.+ -dependent solubilization of circular hmC-containing DNA in crude extracts containing Exonuclease V and concluded from this and other evidence that double-strand breaks were occurring. Purification efforts led to 240-fold purification of a required component but an essential heat-labile, non-dialysable component of the reaction had been lost (Fleishman, et al., J. Biol. Chem., 1561-1570 (1976)). No reports of in vitro activities of the McrBC proteins have appeared since the initial report by Fleishman et al., J. Biol. Chem., 1561-1570 (1976)).
The ability to cleave methylated DNA specifically has been of considerable interest because methylated DNA is widely distributed in nature (Barbeyron et al., J. Bacteriol., 160-586-590 (1984); Ehrlich & Wang, Science, 212:163-170 (1981); Ehrlick et al., J. Bacteriol., 169:939-943 (1987)). It is found in bacteriophage (Trautner et al., Mol. Gen. Genet., 180:361-367 (1980); Warren, Ann. Rev. Microbiol., 34:137-158 (1980)) viruses (van Etten et al., Nucleic Acids Res., 13:3471-3478 (1986)), eubacteria (McClelland & Nelson, Gene, 74:291-304 (1988)), archebacteria (Lunnen et al., Gene, 77:11-19 (1989)), fungi (Mooibroek et al., Mol. Gen. Genet., 222:41-48 (1991); Selker et al., Science, 238:48-53 (1987)), protozoa Capowski et al., Gene, 74:103-104 (1988)), parasites (Pollack et al., Exp. Parasitol., 72:339-344 (1991)) higher plants, Chandler & Walbot, Proc. Natl. Acad. Sci. USA, 83:1767-1771 (1986)), animals Bestor et al., J. Mol. Biol., 203:971-983 (1988)) and cellular organelles (Burton et al., Proc. Natl. Acad. Sci. USA, 76:1390-1394 (1979); Ngernprasirtsiri et al., Cell Struct. Funct., 15:285-293 (1990)).
In bacteria, DNA modification plays an important role in DNA repair and in the timing of replication (Marinus, Ann. Rev. Genet., 21:113-131 (1987); Messer & Noyer-Weidner, Cell, 54:735-737 (1988); Russell & Zinder, Cell, 50:1071-1079 (1987)) as well as regulation of genetic exchange via restriction-modification systems (Modrich & Roberts, Nucleases,, 109-154 (1982); Price & Bickle, Microbiol. Sci., 3:296-299 (1986); Raleigh et al., Raleigh et al., Genetics 122:279-296 (1989)). In eukaryotic organisms DNA methylation is thought to regulate gene expression (Cedar, Cell, 53:3-4 (1988) and abnormal methylation patterns in humans are thought to be associated with the origin of cancer (Nelkin et al., Blood, 77:2431-2434 (1991)), aberrations in development (Holliday, R., Science, 238:163-170 (1987); Oberle et al., Science, 252:1097-1102 (1991); Silva & White, Cell, 54:145-152 (1988)), and genetic disease (Holliday, R. Science, 238:163-170 (1987); Oberle et al., Science, 252:1097-1102 (1991); Silva & White, Cell, 54:145-152 (1988)).
Some genetic diseases are thought to result from the establishment ("imprinting") of aberrant methylation patterns during gametogenesis (egg and sperm development). The term genomic imprinting (Chaillet et al., Cell, 66:77-83 (1991); Solter, Annu. Rev. Genet., 22:127-146 (1988); Surani et al., Philos. Trans. Roy. Soc., (Lond) B 326, 313-327 (1990)) refers to the reversible inactivation of a gene, depending on whether the gene is transmitted through the male or the female parent. That is, a gene may be expressed when it has been inherited from the mother but not when it has been inherited from the father. An inactive gene inherited by a daughter from her father will be reactivated when she passes it to her children (since she is the mother). In other cases, imprinting may occur in the mother, not in the father. Only some genes are subject to imprinting.
In consequence of imprinting, two "defective" genes may be inherited, even when one is wild-type (normal) in sequence. This happens if the wild-type copy is imprinted and thus inactivated, while the non-imprinted copy is mutated at the sequence level; a genetic disease may then result. Diagnosis of genetic disease in such cases will not be possible by the usual sequence-based methods, because non-diseased and diseased individuals cannot be distinguished on the basis of sequence. A non-diseased heterozygote with a mutated, imprinted copy and a wild-type, non-imprinted copy will be indistinguishable from a diseased heterozygote with an wild-type, imprinted copy and a mutated, non-imprinted copy. DNA methylation patterns are closely related to imprinted state, and imprinting may in fact be the same as methylation. Resetting of the imprinted state occurs during gametogenesis (Chaillet, et al., Cell 66:77-83 (1991)) and is associated with changes in DNA methylation of the sequence of the gene and near it (Holliday, R., Science, 238:163-170 (1987); Reik et al., Nature, 328:248-251 (1987); Silva & White, Cell, 54:145-152 (1988)).
The ability to detect DNA methylation readily and accurately is thus desired. Methods for detection of methylation are cumbersome or inaccurate or both. One method used for detection of methylation in mammalian DNA relies on methylation-sensitive enzymes. Methylation of mammalian DNA occurs at some but not all CG dinucleotides, and the presence or absence of methylation may vary depending on the tissue and developmental stage (Cedar, Cell 53:3-4 (1988)). When methylation is present in the vicinity of a gene, many or most CG dinucleotides in the region are modified. Such methylation is usually detected by Southern blot analysis of fragments generated by methylation-sensitive restriction enzymes (Bird & Southern, J. Mol. Biol., 118:27-47 (1978)). The enzymes usually used for this are MspI, which cleaves CCGG whether or not the internal cytosine is methylated, and HpalI, which cleaves the same sequence, but only when the internal C residue is not modified. Cleavage by MspI verifies that a site exists, and failure of HpaII to cleave demonstrates that methylation is present at the particular GG dinucleotide in question. Fragments are visualized by probing with cloned or synthetic probes complementary to the sequence surrounding the site.
A considerable drawback to the above-described method is that only about 1/16 of potentially modified sites can be monitored (1/4 of residues 5' to CG will be C and 1/4 of residues 3' to CG will be G). Failure to detect methylation in a sequence of interest may be due to absence of a suitably located MspI/HpaII site, not to the absence of methylation. Some other pairs of isoschizomers exist (Nelson & McClelland, 1991), but are of limited utility. The pairs XmaI/SmaI (CCCGGG) and AccIII/BspMII (TCCGGA) only detect a subset of the methylation sites detected by MspI/HpaII. The pair AsuII/Csp45I (TTCGAA) detects additional sites, but an even smaller fraction (1/256) of possible sites.
Another method to detect DNA methylation relies on a modification of the Maxam-Gilbert sequencing protocol called genomic sequencing (Church & Gilbert, Proc. Natl. Acad. Sci. USA, 81:1991-1995 (1984)). The method relies on the failure of M5C residues to be cleaved by the Maxam-Gilbert C reaction, resulting in a missing band in the sequence ladder where m5C had been present (Saluz and Jost, Gene, 42:151-157 (1986)). This method can be used only on small stretches of DNA and for best results the sequence must be known, at least sufficiently for design of an oligonucleotide primer to prime synthesis of DNA from within a few hundred basepairs of methylated position. The procedure is long and labor-intensive and is sensitive to reagent contamination.
A third approach relies on converting all C residues except those that are methylated (m5G) to U using sodium bisulfite, followed by amplification by PGR, cloning, and sequencing via the dideoxy chain-termination method (Frommer et al., Proc. Natl. Acad. Sci. USA, 89:1827-1831 (1992)). This procedure yields a positive display of m5C residues--only where m5G residues were present will a band appear in the sequencing ladder--but it otherwise this method suffers similar limitations to the genomic sequencing approach: it requires knowledge of the sequence beforehand, can only be used on short stretches, and is laborious.
In short, better methods for detection of modification are desired.