The isolation, cloning, transfer and recombination of DNA segments, including coding sequences and non-coding sequences, is most conveniently carried out using restriction endonuclease enzymes. Restriction endonucleases are especially useful because each one introduces a hydrolytic cleavage of a phophodiester bond linking adjacent nucleotides of a DNA structure only at a specific, defined site. The site of cleavage is defined by a sequence of nucleotides surrounding or adjacent to the cleavage site. Over one hundred such enzymes are now known, and the sites at which they act, termed restriction sites, are defined. For the majority of such enzymes, the nucleotide sequence defining the restriction site and the bonds cleaved is specific for each enzyme. In some cases, different enzymes recognize the same site or a variant, such as the same sequence with a methylated base, or a shorter subsequence, and act to hydrolyze the same bond, or different bonds within or adjacent to the recognition sequence. Some restriction enzymes hydrolyze bonds adjacent to complementary bases on double-stranded DNA, producing "blunt ends." Others produce staggered cuts which result in the DNA having overlapping complementary or "sticky" ends.
The distribution of restriction sites throughout the genome of an organism is essentially random; except for areas where sequence repeats occur. The frequency of occurrence of a given restriction site is therefore primarily a function of the combinatorial likelihood of the proper nucleotides occurring in the proper sequence. Therefore, restriction sites having shorter nucleotide sequences occur with greater frequency than those with longer sequences. Consequently, if DNA is incubated with a restriction endonuclease that recognizes a four base restriction site sequence (a "four base cutter"), the resulting DNA fragments are, on average, shorter in length than those produced by the action of an endonuclease recognizing a six-base restriction site sequence (a "six-base cutter"). Given a 50% average GC content, the frequency of appearance of a given restriction site is 4.sup.n where n is the number of bases that constitute the recognition site sequence (or more accurately, base pairs [bp] since the sites occur on double-stranded DNA). A site whose length is 4 bp will therefore occur with an average frequency of once every 4.sup.4 =256 bp of DNA, which is the average length of DNA incubated with a 4-base cutter restriction endonuclease. Six-base cutters yield fragments whose calculated average length is 4.sup.6 =4096 bp. Since the distribution of restriction sites is essentially random, many DNA fragments much larger or smaller than 4096 bp are found when DNA is cut with a given 6-base cutter endonuclease. By having a variety of six-base cutting endonucleases available, it is often possible to clone and transfer individual DNA segments much longer than 4096 bp. The need to clone and transfer larger DNA segments has been one factor driving the search for a greater variety restriction endonucleases.
In addition, the search has also yielded restriction enzymes with longer restriction site sequences. Of particular relevance herein are a set of endonucleases designated "rare-cutting" endonucleases, defined herein as those having a recognition site of 9 or more base pairs. The currently known rare-cutting endonucleases are encoded within introns, and their biological function appears to differ from the restriction endonucleases, which are also termed Type II endonucleases. The rare-cutting endonucleases are sometimes termed "homing-endonucleases" in the literature. For general reviews, see Lambowitz, A. M. et al (1993) Ann. Rev. Biochem. 62:587-622; Mueller, J. E. et al. (1993) "Homing endonucleases" in Nucleases (S. M. Linn, S. R. Lloyd and R. J. Roberts, Eds.) Cold Spring Harbor Press, Cold Spring Harbor, N.Y.; and Perler, F. B. et al (1994) Nucl. Acids. Res. 22:125-1127.
U.S. Pat. No. 5,420,032 (Marshall et al.) discloses homing endonuclease I-Ceul, together with its recognition site, which is 19 bp in length. The recognition sequence is somewhat degenerate, so that the actual cutting frequency is somewhat less than the theoretical 4.sup.19 =2.7487.times.10.sup.11 bp; nevertheless the existence of I-CeuI sites is very rare.
U.S. Pat. No. 5,474,896 (Dujon et al) discloses the endonuclease PI-SceI, together with DNA encoding the enzyme and characterization of the recognition sequence. The recognition sequence is 11 bp in length, making its calculated cutting frequency 1 in 4.sup.11 =4,194,300 bp.
Other rare-cutting endonucleases have been described, including I-PpoI (Muscarella, D. E. et al [1990] Mol. Cell. Biol. 19:3386-3396); I-TliI (Perler, F. B. et al. [1992] Proc. Natl. Acad. Sci. USA. 89:5577-5581); and I-PspI (Xu, M. Q. et al. [1993] Cell 75:1371-1377). All the foregoing have 10 bp recognition sites, with a calculated cutting frequency of 1 in 4.sup.10 =1,048,580. All of the above-described rare-cutting endonucleases are currently available from commercial sources, including New England Biolabs, Beverly, Mass. and/or Promega Corp., Madison, Wis.
The described rare-cutting endonucleases generate staggered cuts in double-stranded DNA, resulting in cut ends which have 4-base overlapping, self-complementary single stranded ends. A DNA segment cut with one of these nucleases can only be rejoined, using DNA ligase, to DNA cut in the same orientation, i.e., having the same complementary ends, except for I-PpoI where the single stranded ends generated by the enzyme have the same sequence regardless of the orientation of the restriction site sequence. Consequently these endonucleases (except I-PpoI) have the property of controlling the orientation of insertion of DNA cut with a given enzyme at a site on a vector cut with the same enzyme (or one generating the same complementary ends). In this regard, the property of generating self-complementary ends allowing for oriented vector insertion is shared by many conventional (Type II) restriction endonucleases and is commonly exploited by those skilled in the art to control the orientation of a DNA segment inserted into a vector, with respect to other genes or markers on the vector. Further, it is well-known that such oriented insertion can be by-passed, if desired, by known methods of removing the complementary ends ("blunt-ending") to allow rejoining by "blunt-end" ligation.
I-PpoI can be used to allow any DNA segment to be inserted into a vector in either orientation, with respect to markers on the vector. The enzymes I-PspI, I-TliI, PI-SceI and I-CeuI can be used to insert a DNA segment in a controlled orientation into a vector, depending on the orientation of the recognition sequence with respect to markers on the vector.
The rare-cutting endonucleases have been recognized as useful for excising and inserting large segments of DNA, for example in large-scale mapping, purifying native continuous/non-overlapping DNA segments and the like (Block, C. A. et al. (1996) Biochem. Biophys. Res. Comm. 223:104-111. In addition the rare-cutting endonuclease recognition sites have been incorporated into vectors for cloning and transferring DNA segments. Asselbergs, F. A. M. et al. (1996) BioTechniques 20:558-562 described vectors termed pI-LINK having a single copy of each of PI-SceI, I-PpoI, I-CeuI and I-TliI interspersed with fourteen conventional restriction sites in a single multiple cloning site. Variants having the same multiple cloning site flanking a neomycin resistance gene were also described.