The present invention relates to a novel method for screening and identifying restriction endonucleases based on the proximity of their genes to the genes of their cognate methylases. A similar method for identifying isoschizomers of known endonucleases, which isoschizomers possess a desired physical property is also provided. Related methods for producing and cloning such endonucleases or other cytotoxic proteins are provided, as are several novel M. jannaschii restriction endonucleases.
Nucleases are a class of enzymes which degrade or cut single- or double-stranded DNA. Restriction endonucleases are an important class of nucleases which recognize and bind to particular sequences of nucleotides (the ‘recognition sequence’) along the DNA molecule. Once bound, they cleave both strands of the molecule within, or to one side of, the recognition sequence. Different restriction endonucleases recognize different recognition sequences. Over two hundred restriction endonucleases with unique specificities have been identified among the many hundreds of bacterial and archaeal species that have been examined to date. Some have also been found to be encoded by eukaryotic viruses.
It is thought that in nature, restriction endonucleases, which comprise the first component of what are commonly referred to as restriction-modification (“RM”) systems, play a protective role in the welfare of the host cell. They enable bacteria and archaea to resist infection by foreign DNA molecules like viruses and plasmids that would otherwise destroy or parasitize them. They impart resistance by cleaving invading foreign DNA molecules when the appropriate recognition sequence is present. The cleavage that takes place disables many of the infecting genes and renders the DNA susceptible to further degradation by non-specific endonucleases.
A second component of these bacterial and archaeal protective systems are the modification methylases. These enzymes are complementary to the restriction endonucleases and they provide the means by which bacteria and archaea are able to protect their own DNA from cleavage and distinguish it from foreign, infecting DNA. Usually, modification methylases recognize and bind to the same nucleotide recognition sequence as the corresponding restriction endonuclease, but instead of cleaving the DNA, they chemically modify one or other of the nucleotides within the sequence by the addition of a methyl group. Following methylation, the recognition sequence is no longer bound or cleaved by the restriction endonuclease. The DNA of the host cell is always fully modified by virtue of the activity of the modification methylase. It is therefore completely insensitive to the presence of the endogenous restriction endonuclease. It is only unmodified, and therefore identifiably foreign DNA, that is sensitive to restriction endonuclease recognition and cleavage.
There are three kinds of restriction systems. The Type I systems are complex. They recognize specific sequences, but cleave randomly with respect to that sequence (Bickle, T. A., Nucleases [eds. Linn, S. M., Lloyd, S. L., and Roberts, R. J.], Cold Spring Harbor Laboratory Press, pp. 89-109, (1993)). The Type III enzymes, of which only five have been characterized biochemically, recognize specific sequences, cleave at a precise point away from that sequence, but rarely give complete digestion (ibid). Neither of these two kinds of systems are suitable for genetic engineering, which is the sole province of the Type II systems. The latter recognize a specific sequence and cleave precisely either within or very close to that sequence. They typically only require Mg++ for their action.
The traditional approaches to screening for restriction endonucleases, pioneered by Roberts et al. and others in the early to mid 1970's (e.g. Smith, H. O. and Wilcox, K. W., J. Mol. Biol. 51:379-391 (1970); Kelly, T. J. Jr. and Smith, H. O., J. Mol. Biol. 51:393-409, (1970); Middleton, J. H. et al., J. Virol. 10:42-50 (1972); and Roberts, R. J. et al., J. Mol. Biol. 91:121-123, (1975)), was to grow small cultures of individual strains, prepare cell extracts and then test the crude cell extracts for their ability to produce specific fragments on small DNA molecules (see Schildkraut, I. S., “Screening for and Characterizing Restriction Endonucleases”, in Genetic Engineering, Principles and Methods, Vol. 6, pp. 117-140, Plenum Press (1984)). Using this approach, about 12,000 strains have been screened worldwide to yield the current harvest of almost 3,000 restriction endonucleases (Roberts, R. J. and Macelis, D., Nucl. Acids. Res. 26:338-350 (1998)). Roughly, one in four of all strains examined, using a biochemical approach, shows the presence of a Type II restriction enzyme.
Beginning in 1978, investigators in a number of laboratories set about to clone the genes for some of the Type II restriction systems (Szomolanyi, I. et al., Gene 10:219-225 (1980)). This promised to be quite a successful enterprise because of the ease of selecting for methylase genes (Mann, M. B. et al., Gene 3:97-112 (1978); Kiss, A. M. et al., Nucl. Acids. Res. 13:6403-6420 (1985)). Basically, if an organism is known to contain a restriction system, then a shotgun of the organism's DNA can be made and the resulting mixed population of plasmids can be grown as a single, mixed culture. This mixed population of plasmid DNA's is then isolated, cleaved in vitro with the restriction enzyme, and only those plasmids that have both received and expressed the corresponding methylase gene, will survive the digestion. Upon retransformation, any cells that grow are greatly enriched for the presence of the methylase gene. Because the methylase and restriction enzyme genes are usually adjacent, this method can yield both genes. Sometimes a single round of selection is sufficient, but routinely two rounds of selection yield the required methylase gene with high efficiency. Only when expression of the methylase gene is poor or coexpression of flanking sequences is lethal does the selection fail. Various tricks and alternative cloning methods have been developed to overcome such limitations (e.g. Brooks, J. E. et al., Nucl. Acids. Res. 17:979-997 (1989); Wilson, G. G. and Meda, M. M., U.S. Pat. No. 5,179,015 (1993)).
As the skilled artisan will appreciate restriction endonucleases are cytotoxic products. In general, genes encoding cytotoxic products are extremely difficult to clone, even when care has been taken to remove sequences that might enable their expression in the plasmid host. Generation of their mRNA can be due to ‘read-through’ transcription that originates at some point on the plasmid other than the toxic locus. Absent an identifiable Shine-Dalgarno (SD) consensus sequence upstream of an initiator codon, translation of the toxic protein may be initiated by a cryptic ribosome binding site (RBS) (by definition, not fitting the SD consensus, and usually non-obvious), or abortive termination of an upstream ribosome-mRNA complex. Long mRNA concatamers can be generated from plasmid templates via ‘rolling circle transcription’. This may increase and/or stabilize the mRNA of the toxic allele, so that even rare translational initiation events can generate enough protein to impact cell viability negatively.
Attempting to clone a toxic gene into a plasmid designed to facilitate high expression is, in many cases, futile. Transcriptional repressors are often employed to down-regulate expression, and typically act by interfering with productive transcription. This type of regulation is dependent upon: 1) the molar ratio of repressor protein to its cognate binding site (operator), and 2) the affinity of the repressor protein for the operator sequence. In no case is it reasonable to expect 100% of the operator sites to be occupied 100% of the time. Thus, some expression of a cloned gene is unavoidable, creating a powerful selective pressure against cells that faithfully replicate the lethal gene. Those cells in which expression of the toxic gene has been mutagenically inactivated survive.
Genes encoding cytotoxic products must be actively and constitutively down-regulated, and any adventitious expression eliminated at both the transcriptional and translational levels.
This may be accomplished through the action of antisense RNAs (asRNA). The asRNA base pairs with a segment of mRNA and presumably inhibits translational initiation or elongation. The use of opposing promoters to modulate expression of a gene encoding a potentially toxic protein has been reported (O'Connor and Timmis, J. Bacteriol. 169(10):4457-4462 (1987)). Their system employed the endogenous E. coli RNA polymerase (“RNAP”), with the sense RNA (sRNA) generated from the λ-derived PL promoter, and asRNA initiating at the E. coli Plac promoter. Operator sequences for repressor proteins normally associated with these promoters, namely cl and Lacl, were also present on the high copy plasmid (pUC8/18) backbone. A second copy of the Lacl operator was inserted between PL and the gene of interest. The alleles encoding the cl857 and Lacl repressor proteins were not part of the plasmid, but were provided either from the chromosome (cl857 λ prophage) or on the low copy plasmid pACYC184 (lacl).
This approach to cloning a cytotoxic gene, however, suffers from several shortcomings:
1) a high copy replicon significantly raises the dosage of the toxic allele, increasing the likelihood for undesired expression;
2) placement of operator sequences on a high copy replicon, while the genes encoding the repressor proteins are present at substantially lower copy number, does not provide optimal repression;
3) strong repression of gene expression and elective induction of gene expression are mutually exclusive.
While the idea of using opposing promoters to modulate gene expression has been previously demonstrated (Elledge and Davis, Genes & Develop. 3:185-197 (1988)), it has not been demonstrated as a successful method using a toxic gene. The Elledge, et al. system relies upon conditional expression of a gene encoding spectinomycin resistance. This approach proved to be a useful genetic selection for genes encoding proteins capable of exhibiting transcriptional repressor-like activity (Elledge et al., PNAS USA 86:3689-3693 (1989); Dorner and Schildkraut, Nucl. Acid. Res. 22(6):1068-1074 (1994)). These studies showed that transcriptional inactivation of a gene can be achieved with an antisense promoter.
It is imperative that stable clones of desired loci (including those encoding cytotoxic products) be established in the context of an inducible expression system, such as an E. coli expression system, for the following reasons:
a) to generate a physical archive of single genes encoding potentially novel biochemical activities (as opposed to phage or cosmid constructs containing many genes);
b) to allow for rapid and facile characterization and/or manipulation of the entire allele;
c) and to move rapidly from discovery to production.
It would therefore be desirable to develop a method for cloning genes encoding cytotoxic products, including restriction endonucleases, or other genes which cannot be stably cloned by traditional methods, in order to enable the generation of the above-mentioned archive.
Nonetheless, as a result of current cloning methods, more than 100 systems have been cloned and many have been sequenced (Wilson , G. G., Nucl. Acids. Res. 19:2539-2566 (1991)). Several conclusions have emerged. First, genes for restriction endonucleases that recognize unique sequences are usually different from one another and their sequences are unique within GenBank. Typically, the only time when similarity has been found between restriction enzyme gene sequences is when the two enzymes are isoschizomers or have closely related recognition sequences; i.e. they recognize exactly the same sequence, but come from different microorganisms (e.g. Lubys, A. et al., Gene 141:85-89 (1994); Withers, B. E. et al., Nucl. Acids. Res. 20:6267-6273 (1992)). Second, among methylase gene sequences there is very strong similarity between enzymes that form 5-methylcytosine (m5C), such that they can readily be recognized by pattern matching algorithms (Posfai, J. et al., Nucleic Acids. Res. 17:2421-2435 (1989); Lauster, R. et al., J. Mol. Biol. 206:305-312 (1989)). The genes for methylases that form N6-methyladenine (N6A) or N4-methylcytosine (N4C) are also related to one another, but show fewer well-conserved similarities. At least three subfamilies of sequences can be recognized (Wilson, G. G., Meth. Enzymol. 216:259-279 (1992), Timinskas et al. Gene 157: 3-11 (1995)). In this case, pattern matching algorithms do fairly well, but cannot provide conclusive evidence whether a newly sequenced gene encodes an N6A or an N4C methyltransferase. Third, and most significant, for virtually all known RM systems that have so far been cloned, the methylase gene and the restriction enzyme gene lie either adjacent or extremely close to one another (Wilson , G. G., Nucl. Acids. Res. 19:2539-2566 (1991)).
Within the last year, sequences have become available for many complete bacterial and archaeal genomes, including: Haemophilus influenzae (Fleischmann, R. D. et al., Science 269:496-512 (1995)), Mycoplasma genitalia (Fraser, C. M. et al., Science 270:397-403 (1995)), Methanococcus jannaschii (Bult, C. J. et al., Science 273:1058-1073 (1996), Mycoplasma pneumoniae (Himmelreich, R. et al., Nucl. Acids. Res. 24:4420-4449 (1996)) and Synechocystis species (Kaneko, T. et al., DNA Res. 3:109-136 (1996) ). H. influenzae and M. jannaschii were each known to encode two Type II RM systems (Roberts, R. J. and Macelis, D. M., supra (1998)). The complete sequences of their genomes have revealed a remarkable fact. In each case, these genomes appear to contain multiple RM systems many of which have never been detected biochemically. The results of computer analysis of these sequences is compared with the biochemical results shown in Table 1:
TABLE 1RM SystemsRM Systems DectectedDetectedOrganismsby ComputerBiochemicallyH. influenzae82M. genitalia2not testedM. jannaschii122M. pneumoniae4not testedSynechocystis species4not tested
As mentioned earlier, among Type II restriction enzymes there are now more than two hundred different specificities present. Table 2 shows the kind of sequence patterns that are currently known to be recognized by restriction endonucleases. It lists the number of specific examples of each presently in the database, compared with the theoretical number based on all possible sequence combinations.
In column 1 of this table, the pattern representation, n′, signifies the complement of n. Thus nnn′n′ in the first entry is used to represent the 16 possible tetranucleotide palindromes AATT, ACGT, AGCT etc.
It is clear that for some types of patterns, such as the simple hexanucleotide and tetranucleotide palindromes, we are very close to having all possible such enzymes. However, for many of the other patterns we are a long way away from the theoretically possible number. This suggests that there are many more specificities waiting to be discovered.
Accordingly, it would be desirable to provide an alternative method for screening for restriction endonucleases which would overcome the limitations associated with the traditional biochemical methods described above. Such an alternative method would facilitate the identification, characterization, and cloning of heretofore unknown restriction endonucleases as well as isoschizomers of known restriction endonucleases.
TABLE 2Sequence patterns recognized by Type II restriction enzymesSpecific ExamplePatternRec. SequenceEnzymeObservedPossiblennn′n′AATTTspEI1416nnnn′n′n′AACGTTAc/l5564nnnnn′n′n′n′ATTTAAATSwal9256nnnnnACGGCBcefl181024nnnnnnACCTGCBspMI254096nnNn′n′ACNGTTsp4Cl716nDnn′HnGDGCHCSdul116nKnnnnGKGCCCBmgl11024nMnn′Kn′CMGCKGNspBII116nnBNNNNNVn′n′GABNNNNNVTCHin4l116nnMKn′n′GTMKACAccl116nnnnCCGCAcil2256nnNNn′n′CCNNGGSecl316nnnNn′n′nCCTNAGGSaul364nnnNnnnCACCTGCUbaEl34096nnnNNNn′n′n′CACNNNGTGDralll364nnnNNNNn′n′n′GAANNNNTTCXmnl364nnnNNNNNn′n′n′CCANNNNNTGGPflMI664nnNNNNNNNn′n′CGNNNNNNNGGBsiYI216nnnNNNNNNn′n′n′ACCNNNNNNGGTHgiEII364nnnnNNNNNn′n′n′n′GGCCNNNNNGGCCSfil1256nnNNNNNnnnnACNNNNNCTCCBsaXI24096nnnNNNNNNNnnCGANNNNNNTGCBcgl31024nnnnNNNNNNnnnGAACNNNNNNTCCUbaDI116384nnnNNNNNNNNNn′n′n′CCANNNNNNNNNTGGXcml164nnNNNNnnnYnACNNNNGTAYCBael14096nnnRnnCAARCATth111ll21024nnnWn′n′n′ACCWGGTSexAI464nnRYn′n′ACRYGTAf/lll416nnSn′n′CCSGGCaull316nnWn′n′CCWGGEcoRII416nnWWn′n′CCWWGGStyl116nnYNNNNRn′n′CAYNNNNRTGMs/l116nnYRn′n′CTYRAGSm/l316nRnn′Yn′GRCGYCAcyl216nRnnn′n′Yn′CRCCGGYGSgrAl164nWnn′Wn′GWGCWCHgiAI116nYnn′Rn′CYCGRGAval116Rnn′YRGCYCviJI14Rnnn′n′YRAATTYApol516RnnNn′n′YRGGNCCYDrall116RnnWn′n′YRGGWCCYPpuMI116Wnnn′n′WWCCGGWBetl316Ynnn′n′RYACGTRBsaAI216YnnnnnCGGCCRGdill11024