The present invention relates to recombinant DNA that encodes the TspRI restriction endonuclease (TspRI endonuclease or TspRI) as well as TspRI methyltransferase (TspRI methylase or M.TspRI), expression of TspRI endonuclease and methylase in E. coli cells containing the recombinant DNA.
TspRI endonuclease is found in the strain of Thermus species R (New England Biolabs"" strain collection). It recognizes the double-stranded, palindromic DNA sequence 5xe2x80x2 NNCASTGNN↓3xe2x80x2 (SEQ ID NO:1) (S=C or G, ↓ indicates the cleavage position) and cleaves on both sides of the recognition sequence, generating a 9-base 3xe2x80x2 overhang. TspRI methylase (M.TspRI) is also found in the same strain. It recognizes the double-stranded DNA sequence 5xe2x80x2 CASTG 3xe2x80x2 (SEQ ID NO:2) and presumably modifies the cytosine at the C5 position on hemi-methylated or non-methylated TspRI sites.
Type II restriction endonucleases are a class of enzymes that occur naturally in bacteria and in some viruses. When they are purified away from other bacterial/viral proteins, restriction endonucleases can be used in the laboratory to cleave DNA molecules into small fragments for molecular cloning and gene characterization.
Restriction endonucleases recognize and bind particular sequences of nucleotides (the xe2x80x98recognition sequencexe2x80x99) along the DNA molecules. Once bound, they cleave the molecule within (e.g. BamHI), to one side of (e.g. SapI), or to both sides (e.g. TspRI) of the recognition sequence. Different restriction endonucleases have affinity for different recognition sequences. Over two hundred and eleven restriction endonucleases with unique specificities have been identified among the many hundreds of bacterial species that have been examined to date (Roberts and Macelis, Nucl. Acids Res. 27:312-313 (1999)).
Restriction endonucleases typically are named according to the bacteria from which they are discovered. Thus, the species Deinococcus radiophilus for example, produces three different restriction endonucleases, named DraI, DraII and DraIII. These enzymes recognize and cleave the sequences 5xe2x80x2TTT↓AAA 3xe2x80x2 (SEQ ID NO:3), 5xe2x80x2 PuG↓GNCCPy 3xe2x80x2 (SEQ ID NO:4) and 5xe2x80x2 CACNNN↓GTG 3xe2x80x2 (SEQ ID NO:5) respectively. Escherichia coli RY13, on the other hand, produces only one enzyme, EcoRI, which recognizes the sequence 5xe2x80x2 G↓AATTC 3xe2x80x2 (SEQ ID NO:6).
A second component of bacterial/viral restriction-modification (R-M) systems are the methylase. These enzymes co-exist with restriction endonucleases and they provide the means by which bacteria are able to protect their own DNA and distinguish it from foreign DNA. Modification methylases recognize and bind to the same recognition sequence as the corresponding restriction endonuclease, but instead of cleaving the DNA, they chemically modify one particular nucleotide within the sequence by the addition of a methyl group (C5 methyl cytosine, N4 methyl cytosine, or N6 methyl adenine). Following methylation, the recognition sequence is no longer cleaved by the cognate restriction endonuclease. The DNA of a bacterial cell is always fully modified by the activity of its modification methylase. It is therefore completely insensitive to the presence of the endogenous restriction endonuclease. Only unmodified, and therefore identifiably foreign DNA, is sensitive to restriction endonuclease recognition and cleavage. During and after DNA replication, usually the hemi-methylated DNA (DNA methylated on one strand) is also resistant to the cognate restriction digestion.
With the advancement of recombinant DNA technology, it is now possible to clone genes and overproduce the enzymes in large quantities. The key to isolating clones of restriction endonuclease genes is to develop an efficient method to identify such clones within genomic DNA libraries, i.e. populations of clones derived by xe2x80x98shotgunxe2x80x99 procedures, when they occur at frequencies as low as 10xe2x88x923 to 10xe2x88x924. Preferably, the method should be selective, such that the unwanted clones with non-methylase inserts are destroyed while the desirable rare clones survive.
A large number of type II restriction-modification systems have been cloned. The first cloning method used bacteriophage infection as a means of identifying or selecting restriction endonuclease clones (EcoRII: Kosykh et al., Mol. Gen. Genet. 178:717-719, (1980); HhaII: Mann et al., Gene 3:97-112, (1978); PstI: Walder et al., Proc. Nat. Acad. Sci. 78:1503-1507, (1981)). Since the expression of restriction-modification systems in bacteria enables them to resist infection by bacteriophages, cells that carry cloned restriction-modification genes can, in principle, be selectively isolated as survivors from genomic DNA libraries that have been exposed to phage. However, this method has been found to have only a limited success rate. Specifically, it has been found that cloned restriction-modification genes do not always confer sufficient phage resistance to achieve selective survival.
Another cloning approach involves transferring systems initially characterized as plasmid-borne into E. coli cloning vectors (EcoRV: Bougueleret et al., Nucl. Acids. Res. 12:3659-3676, (1984); PaeR7: Gingeras and Brooks, Proc. Natl. Acad. Sci. USA 80:402-406, (1983); Theriault and Roy, Gene 19:355-359 (1982); PvuII: Blumenthal et al., J. Bacteriol. 164:501-509, (1985); Tsp45I: Wayne et al. Gene 202:83-88, (1997)).
A third approach is to select for active expression of methylase genes (methylase selection) (U.S. Pat. No. 5,200,333 and BsuRI: Kiss et al., Nucl. Acids. Res. 13:6403-6421, (1985)). Since restriction-modification genes are often closely linked together, both genes can often be cloned simultaneously. This selection does not always yield a complete restriction system however, but instead yields only the methylase gene (BspRI: Szomolanyi et al., Gene 10:219-225, (1980); BcnI: Janulaitis et al., Gene 20:197-204 (1982); BsuRI: Kiss and Baldauf, Gene 21:111-119, (1983); and MspI: Walder et al., J. Biol. Chem. 258:1235-1241, (1983)).
A more recent method, the xe2x80x9cendo-bluexe2x80x9d method, has been described for direct cloning of thermostable restriction endonuclease genes into E. coli based on the indicator strain of E. coli containing the dinD::lacZ fusion (Fomenkov et al., U.S. Pat. No. 5,498,535; Fomenkov et al., Nucl. Acids Res. 22:2399-2403, (1994)). This method utilizes the E. coli SOS response signals following DNA damage caused by restriction endonucleases or non-specific nucleases. A number of thermostable nuclease genes (TaqI, TthlllI, BsoBI, Tf nuclease) have been cloned by this method (U.S. Pat. No. 5,498,535, 1996). The disadvantage of this method is that some positive blue clones containing a restriction endonuclease gene are difficult to culture due to the lack of the cognate methylase gene.
There are three major groups of DNA methylases based on the position and the base that is modified (C5 cytosine methylases, N4 cytosine methylases, and N6 adenine methylases). N4 cytosine and N6 adenine methylases are amino-methyltransferases (Malone et al. J. Mol. Biol. 253:618-632, (1995)). When a restriction site on DNA is modified (methylated) by the methylase, it is resistant to digestion by the cognate restriction endonuclease. Sometimes methylation by a non-cognate methylase can also confer the DNA site resistant to restriction digestion. For example, Dcm methylase modification of 5xe2x80x2 CCWGG 3xe2x80x2 (SEQ ID NO:7) (W=A or T) can also make the DNA resistant to PspGI restriction digestion. Another example is that CpM methylase can modify the C in CG dinucloetide and make the NotI site (5xe2x80x2 GCGGCCGC 3xe2x80x2 (SEQ ID NO:8)) refractory to NotI digestion (New England Biolabs"" Catalog, 2000-01, page 220). Therefore methylases can be used as a tool to modify certain DNA sequences and make them uncleavable by restriction enzymes.
Because purified restriction endonucleases and modification methylases are useful tools for creating recombinant molecules in the laboratory, there is a strong commercial interest to obtain bacterial strains through recombinant DNA techniques that produce large quantities of restriction enzymes. Such over-expression strains should also simplify the task of enzyme purification.
The present invention relates to isolated DNA coding for the TspRI restriction endonuclease as well as to a method for cloning the TspRI restriction gene, tspRIR, from Thermus species R into E. coli by direct PCR from genomic DNA using degenerate primers based on the N-terminus and internal amino acid sequences.
It proved extremely difficult to clone TspRI endonuclease gene by conventional method. At first, a Sau3AI partial genomic DNA library was constructed. After TspRI digestion of the plasmids in the library, methylase positive clones were identified among the surviving transformants. The entire tspRIM gene was sequenced and adjacent DNA sequences beyond tspRIM gene were derived by inverse PCR. Four open reading frames (ORF1-ORF4) were found upstream and one ORF (ORF5) was found downstream. These ORFs were expressed in M.TspRI pre-modified host, but no TspRI activity was detected in cell extracts prepared from the clones with inserts of ORF1-ORF4 or ORF5.
Since methylase selection and inverse PCR cloning did not yield any TspRI positive clones, another cloning method, the xe2x80x9cendo-bluexe2x80x9d method was used to screen clones containing nuclease genes. More than 40 blue colonies were found from the Sau3AI partial library using the dinD: :lacZ indicator strain. However, no apparent TspRI endonuclease activity was detected in the cell extracts of blue colonies.
To find out whether TspRI methylase is a multi-specific methylase, the plasmid pUC-TspRIM was digested with many restriction enzymes that would be blocked by C5 methylation of their cognate C5 methylases. The plasmid pUC-TspRIM can be cleaved by all the restriction enzymes tested except TspRI endonuclease, suggesting that TspRI methylase is not a multi-specific methylase.
In order to obtain the N-terminus and internal amino acid sequences, major efforts were made to purify the native TspRI endonuclease to homogeneity. The successful cloning strategy was to design degenerate primers based on the N-terminus and internal amino acid sequences and to amplify TspRI coding sequence directly from genomic DNA by PCR. TspRI endonuclease was purified from the native strain Thermus cell extract by chromatography through Heparin hyper D, Source 15Q, Heparin tsk gel, Source 15S, Heparin tsk columns, and gel filtration column Sephadex 75. The purified homogeneous TspRI protein has an apparent molecular mass of 58 kDa, which was subjected to sequential degradation to obtain the N-terminus amino acid sequence. TspRI protein was also digested partially with CNBr, resulting three peptides with apparent molecular mass 6 kDa, 14 kDa, and 26 kDa. They were electro-blotted and sequenced to obtain the internal amino acid sequence of TspRI protein. Degenerate primers were made and a xcx9c260 bp PCR product was found in a PCR reaction using a forward primer (designed from TspRI N-terminus amino acid sequence) and a reverse primer (designed from the internal 6 kDa amino acid sequence). The PCR product was cloned, sequenced and proved to be the bona fide N-terminal TspRI coding sequence. The C-terminus coding sequence of TspRI was identified from the partial ORF (355 bp) downstream of the putative T-G mismatch repair gene in that the predicted amino acid sequence matches the actual amino acid sequence of the CNBr-derived 14 kDa peptide. The entire tspRIR gene was amplified by PCR and ligated to a T7 expression vector pET21at and transformed into pre-modified expression host ER2566 [pACYC-TspRIM]. However, no desired insert was detected among the ApR CmR transformants. Therefore, the tspRIR gene was cloned and expressed in a low-copy-number T7 expression vector pACYC-T7ter. After clones with inserts were identified, the recombinant TspRI activity in cell extracts was detected by digestion of xcex DNA. Both the tspRIR PCR product and the insert in pACYC-T7ter were sequenced and confirmed to encode the wild type amino acid sequence.