1. Field of the Invention
The present invention relates to the fields of biotechnology and molecular biology. In particular, the present invention relates to joining multiple nucleic acid molecules containing recombination sites, preferably using recombination sites having a unique specificity. The present invention also relates to cloning such joined nucleic acid molecules using recombinational cloning methods. The invention also relates to joining multiple peptides, and combinations of peptides and nucleic acid molecules through the use of recombination sites. Other molecules and compounds or combinations of molecules and compounds may also be joined through recombination sites according to the invention. Such peptides, nucleic acids and other molecules and/or compounds (or combinations thereof) may also be joined or bound through recombination to one or a number of supports or structures in accordance with the invention.
2. Related Art
Site-specific Recombinases
Site-specific recombinases are proteins that are present in many organisms (e.g., viruses and bacteria) and have been characterized as having both endonuclease and ligase properties. These recombinases (along with associated proteins in some cases) recognize specific sequences of bases in a nucleic acid molecule and exchange the nucleic acid segments flanking those sequences. The recombinases and associated proteins are collectively referred to as “recombination proteins” (see, e.g., Landy, A., Current Opinion in Biotechnology 3:699–707 (1993)).
Numerous recombination systems from various organisms have been described. See, e.g., Hoess, et al., Nucleic Acids Research 14(6):2287 (1986); Abremski, et al., J Biol. Chem. 261(1):391 (1986); Campbell, J. Bacteriol. 174(23):7495 (1992); Qian, et al., J Biol. Chem. 267(11):7794 (1992); Araki, et al., J Mol. Biol. 225(1):25 (1992); Maeser and Kahnmann, Mol. Gen. Genet. 230:170–176) (1991); Esposito, et al., Nucl. Acids Res. 25(18):3605 (1997). Many of these belong to the integrase family of recombinases (Argos, et al., EMBO J. 5:433–440 (1986); Voziyanov, et al., Nucl. Acids Res. 27:930 (1999)). Perhaps the best studied of these are the Integrase/att system from bacteriophage λ(Landy, A. Current Opinions in Genetics and Devel. 3:699–707 (1993)), the Cre/loxP system from bacteriophage P1 (Hoess and Abremski (1990) In Nucleic Acids and Molecular Biology, vol. 4. Eds.: Eckstein and Lilley, Berlin-Heidelberg: Springer-Verlag; pp.90–109), and the FLP/FRT system from the Saccharoinyces cerevisiae 2 μ circle plasmid (Broach, et al., Cell 29:227–234 (1982)).
Transposons
Transposons are mobile genetic elements. Transposons are structurally variable, being described as simple or compound, but typically encode a transposition catalyzing enzyme, termed a transposase, flanked by DNA sequences organized in inverted orientations. For a more thorough discussion of the characteristics of transposons, one may consult Mobile Genetic Elements, D. J. Sherratt, Ed., Oxford University Press (1995) and Mobile DNA, D. E. Berg and M. M. Howe, Eds., American Society for Microbiology (1989), Washington, D.C. both of which are specifically incorporated herein by reference.
Transposons have been used to insert DNA into target DNA. As a general rule, the insertion of transposons into target DNA is a random event. One exception to this rule is the insertion of transposon Tn7. Transposon Tn7 can integrate itself into a specific site in the E. coli genome as one part of its life cycle (Stellwagen, A. E., and Craig, N. L. Trends in Biochemical Sciences 23, 486–490, 1998 specifically incorporated herein by reference). This site specific insertion has been used in vivo to manipulate the baculovirus genome (Lucklow et al., J Virol. 67:4566–4579 (1993) specifically incorporated herein by reference). The site specificity of Tn7 is atypical of transposable elements whose hallmark is movement to random positions in acceptor DNA molecules. For the purposes of this application, transposition will be used to refer to random or quasi-random movement, unless otherwise specified, whereas recombination will be used to refer to site specific recombination events. Thus, the site specific insertion of Tn7 into the attTn7 site would be referred to as a recombination event while the random insertion of Tn7 would be referred to as a transposition event.
York, et al. (Nucleic Acids Research, 26(8):1927–1933, (1998)) disclose an in vitro method for the generation of nested deletions based upon an intramolecular transposition within a plasmid using Tn5. A vector containing a kanamycin resistance gene flanked by two 19 base pair Tn5 transposase recognition sequences and a target DNA sequence was incubated in vitro in the presence of purified transposase protein. Under the conditions of low DNA concentration employed, the intramolecular transposition reaction was favored and was successfully used to generate a set of nested deletions in the target DNA. The authors suggested that this system might be used to generate C-terminal truncations in a protein encoded by the target DNA by the inclusion of stop signals in all three reading frames adjacent to the recognition sequences. In addition, the authors suggested that the inclusion of a His tag and kinase region might be used to generate N-terminal deletion proteins for further analysis.
Devine, et al., (Nucleic Acids Research, 22:3765–3772 (1994) and U.S. Pat. Nos. 5,677,170 and 5,843,772, all of which are specifically incorporated herein by reference) disclose the construction of artificial transposons for the insertion of DNA segments into recipient DNA molecules in vitro. The system makes use of the insertion-catalyzing enzyme of yeast TY1 virus-like particles as a source of transposase activity. The DNA segment of interest is cloned, using standard methods, between the ends of the transposon-like element TY1. In the presence of the TY1 insertion-catalyzing enzyme, the resulting element integrates randomly into a second target DNA molecule.
Another class of mobile genetic elements are integrons. Integrons generally consist of a 5′- and a 3′-conserved sequence flanking a variable sequence. Typically, the 5′-conserved sequence contains the coding information for an integrase protein. The integrase protein may catalyze site-specific recombination at a variety of recombination sites including attI, attC as well as other types of sites (see Francia et al., J Bacteriology 181(21):6844–6849, 1999, and references cited therein).
Recombination Sites
Whether the reactions discussed above are termed recombination, transposition or integration and are catalyzed by a recombinase or integrase, they share the key feature of specific recognition sequences, often termed “recombination sites,” on the nucleic acid molecules participating in the reactions. These recombination sites are sections or segments of nucleic acid on the participating nucleic acid molecules that are recognized and bound by the recombination proteins during the initial stages of integration or recombination. For example, the recombination site for Cre recombinase is loxP which is a 34 base pair sequence comprised of two 13 base pair inverted repeats (serving as the recombinase binding sites) flanking an 8 base pair core sequence. (See FIG. 1 of Sauer, B., Curr. Opin. Biotech. 5:521–527 (1994).) Other examples of recognition sequences include the attB, attP, attL, and attR sequences which are recognized by the recombination protein λ Int. attB is an approximately 25 base pair sequence containing two 9 base pair core-type Int binding sites and a 7 base pair overlap region, while attP is an approximately 240 base pair sequence containing core-type Int binding sites and arm-type Int binding sites as well as sites for auxiliary proteins integration host factor (IHF), FIS and excisionase (Xis). (See Landy, Curr. Opin. Biotech. 3:699–707 (1993).)
Stop Codons and Suppressor tRNAs
Three codons are used by both eukaryotes and prokaryotes to signal the end of gene. When transcribed into mRNA, the codons have the following sequences: UAG (amber), UGA (opal) and UAA (ochre). Under most circumstances, the cell does not contain any tRNA molecules that recognize these codons. Thus, when a ribosome translating an mRNA reaches one of these codons, the ribosome stalls and falls of the RNA, terminating translation of the mRNA. The release of the ribosome from the mRNA is mediated by specific factors (see S. Mottagui-Tabar, NAR 26(11), 2789, 1998). A gene with an in-frame stop codon (TAA, TAG, or TGA) will ordinarily encode a protein with a native carboxy terminus. However, suppressor tRNAs, can result in the insertion of amino acids and continuation of translation past stop codons.
Mutant tRNA molecules that recognize what are ordinarily stop codons suppress the termination of translation of an mRNA molecule and are termed suppressor tRNAs. A number of such suppressor tRNAs have been found. Examples include, but are not limited to, the supE, supP, supD, supF and supZ suppressors which suppress the termination of translation of the amber stop codon, supB, glT, supL, supN, supC and supM suppressors which suppress the function of the ochre stop codon and glyT, trpT and Su-9 which suppress the function of the opal stop codon. In general, suppressor tRNAs contain one or more mutations in the anti-codon loop of the tRNA that allows the tRNA to base pair with a codon that ordinarily functions as a stop codon. The mutant tRNA is charged with its cognate amino acid residue and the cognate amino acid residue is inserted into the translating polypeptide when the stop codon is encountered. For a more detailed discussion of suppressor tRNAs, the reader may consult Eggertsson, et al., (1988) Microbiological Review 52(3):354–374, and Engleerg-Kukla, et al. (1996) in Escherichia coli and Salmonella Cellular and Molecular Biology, Chapter 60, pps 909–921, Neidhardt, et al. eds., ASM Press, Washington, D.C.
Mutations which enhance the efficiency of termination suppressors, i.e., increase the read through of the stop codon, have been identified. These include, but are not limited to, mutations in the uar gene (also known as the prfA gene), mutations in the ups gene, mutations in the sueA, sueB and sueC genes, mutations in the rpsD (ramA) and rpsE (spcA) genes and mutations in the rplL gene.
Under ordinary circumstances, host cells would not be expected to be healthy if suppression of stop codons is too efficient. This is because of the thousands or tens of thousands of genes in a genome, a significant fraction will naturally have one of the three stop codons; complete read-through of these would result in a large number of aberrant proteins containing additional amino acids at their carboxy termini. If some level of suppressing tRNA is present, there is a race between the incorporation of the amino acid and the release of the ribosome. Higher levels of tRNA may lead to more read-through although other factors, such as the codon context, can influence the efficiency of suppression.
Organisms ordinarily have multiple genes for tRNAs. Combined with the redundancy of the genetic code (multiple codons for many of the amino acids), mutation of one tRNA gene to a suppressor tRNA status does not lead to high levels of suppression. The TAA stop codon is the strongest, and most difficult to suppress. The TGA is the weakest, and naturally (in E. coli) leaks to the extent of 3%. The TAG (amber) codon is relatively tight, with a read-through of ˜1% without suppression. In addition, the amber codon can be suppressed with efficiencies on the order of 50% with naturally occurring suppressor mutants.
Suppression has been studied for decades in bacteria and bacteriophages. In addition, suppression is known in yeast, flies, plants and other eukaryotic cells including mammalian cells. For example, Capone, et al. (Molecular and Cellular Biology 6(9):3059–3067, 1986) demonstrated that suppressor tRNAs derived from mammalian tRNAs could be used to suppress a stop codon in mammalian cells. A copy of the E. coli chloramphenicol acetyltransferase (cat) gene having a stop codon in place of the codon for serine 27 was transfected into mammalian cells along with a gene encoding a human serine tRNA which had been mutated to form an amber, ochre, or opal suppressor derivative of the gene. Successful expression of the cat gene was observed. An inducible mammalian amber suppressor has been used to suppress a mutation in the replicase gene of polio virus and cell lines expressing the suppressor were successfully used to propagate the mutated virus (Sedivy, et al., (1987) Cell 50: 379–389). The context effects on the efficiency of suppression of stop codons by suppressor tRNAs has been shown to be different in mammalian cells as compared to E. coli (Phillips-Jones, et al., (1995) Molecular and Cellular Biology 15(12): 6593–6600, Martin, et al.,(1993) Biochemical Society Transactions 21:) Since some human diseases are caused by nonsense mutations in essential genes, the potential of suppression for gene therapy has long been recognized (see Temple, et al. (1982) Nature 296(5857):537–40). The suppression of single and double nonsense mutations introduced into the diphtheria toxin A-gene has been used as the basis of a binary system for toxin gene therapy (Robinson, et al., (1995) Human Gene Therapy 6:137–143).
Conventional Nucleic Acid Cloning
The cloning of nucleic acid segments currently occurs as a daily routine in many research labs and as a prerequisite step in many genetic analyses. The purpose of these clonings is various, however, two general purposes can be considered: (1) the initial cloning of nucleic acid from large DNA or RNA segments (chromosomes, YACs, PCR fragments, mRNA, etc.), done in a relative handful of known vectors such as pUC, pGem, pBlueScript, and (2) the subcloning of these nucleic acid segments into specialized vectors for functional analysis. A great deal of time and effort is expended both in the transfer of nucleic acid segments from the initial cloning vectors to the more specialized vectors. This transfer is called subcloning.
The basic methods for cloning have been known for many years and have changed little during that time. A typical cloning protocol is as follows:
(1) digest the nucleic acid of interest with one or two restriction enzymes;
(2) gel purify the nucleic acid segment of interest when known;
(3) prepare the vector by cutting with appropriate restriction enzymes, treating with alkaline phosphatase, gel purify etc., as appropriate;
(4) ligate the nucleic acid segment to the vector, with appropriate controls to eliminate background of uncut and self-ligated vector;
(5) introduce the resulting vector into an E. coli host cell;
(6) pick selected colonies and grow small cultures overnight;
(7) make nucleic acid minipreps; and
(8) analyze the isolated plasmid on agarose gels (often after diagnostic restriction enzyme digestion) or by PCR.
The specialized vectors used for subcloning nucleic acid segments are functionally diverse. These include but are not limited to: vectors for expressing nucleic acid molecules in various organisms; for regulating nucleic acid molecule expression; for providing tags to aid in protein purification or to allow tracking of proteins in cells; for modifying the cloned nucleic acid segment (e.g., generating deletions); for the synthesis of probes (e.g., riboprobes); for the preparation of templates for nucleic acid sequencing; for the identification of protein coding regions; for the fusion of various protein-coding regions; to provide large amounts of the nucleic acid of interest, etc. It is common that a particular investigation will involve subcloning the nucleic acid segment of interest into several different specialized vectors.
As known in the art, simple subclonings can be done in one day (e.g., the nucleic acid segment is not large and the restriction sites are compatible with those of the subcloning vector). However, many other subclonings can take several weeks, especially those involving unknown sequences, long fragments, toxic genes, unsuitable placement of restriction sites, high backgrounds, impure enzymes, etc. One of the most tedious and time consuming type of subcloning involves the sequential addition of several nucleic acid segments to a vector in order to construct a desired clone. One example of this type of cloning is in the construction of gene targeting vectors. Gene targeting vectors typically include two nucleic acid segments, each identical to a portion of the target gene, flanking a selectable marker. In order to construct such a vector, it may be necessary to clone each segment sequentially, i.e., first one gene fragment is inserted into the vector, then the selectable marker and then the second fragment of the target gene. This may require a number of digestion, purification, ligation and isolation steps for each fragment cloned. Subcloning nucleic acid fragments is thus often viewed as a chore to be done as few times as possible.
Several methods for facilitating the cloning of nucleic acid segments have been described, e.g., as in the following references.
Ferguson, J., et al., Gene 16:191 (1981), disclose a family of vectors for subcloning fragments of yeast nucleic acids. The vectors encode kanamycin resistance. Clones of longer yeast nucleic acid segments can be partially digested and ligated into the subcloning vectors. If the original cloning vector conveys resistance to ampicillin, no purification is necessary prior to transformation, since the selection will be for kanamycin.
Hashimoto-Gotoh, T., et al., Gene 41:125 (1986), disclose a subcloning vector with unique cloning sites within a streptomycin sensitivity gene; in a streptomycin-resistant host, only plasmids with inserts or deletions in the dominant sensitivity gene will survive streptomycin selection.
Notwithstanding the improvements provided by these methods, traditional subclonings using restriction and ligase enzymes are time consuming and relatively unreliable. Considerable labor is expended, and if two or more days later the desired subclone can not be found among the candidate plasmids, the entire process must then be repeated with alternative conditions attempted.
Recombinational Cloning
Cloning systems that utilize recombination at defined recombination sites have been previously described in the related applications listed above, and in U.S. application Ser. No. 09/177,387, filed Oct. 23, 1998; U.S. application Ser. No. 09/517,466, filed Mar. 2, 2000; and U.S. Pat. Nos. 5,888,732 and 6,143,557, all of which are specifically incorporated herein by reference. In brief, the GATEWAY™ Cloning System, described in this application and the applications referred to in the related applications section, utilizes vectors that contain at least one recombination site to clone desired nucleic acid molecules in vivo or in vitro. More specifically, the system utilizes vectors that contain at least two different site-specific recombination sites based on the bacteriophage lambda system (e.g., att1 and att2) that are mutated from the wild-type (att0) sites. Each mutated site has a unique specificity for its cognate partner att site (i.e., its binding partner recombination site) of the same type (for example attB1 with attP 1, or attL1 with attR1) and will not cross-react with recombination sites of the other mutant type or with the wild-type att0 site. Different site specificities allow directional cloning or linkage of desired molecules thus providing desired orientation of the cloned molecules. Nucleic acid fragments flanked by recombination sites are cloned and subcloned using the GATEWAY™ system by replacing a selectable marker (for example, ccdB) flanked by att sites on the recipient plasmid molecule, sometimes termed the Destination Vector. Desired clones are then selected by transformation of a ccdB sensitive host strain and positive selection for a marker on the recipient molecule. Similar strategies for negative selection (e.g., use of toxic genes) can be used in other organisms such as thymidine kinase (TK) in mammals and insects.
Mutating specific residues in the core region of the att site can generate a large number of different att sites. As with the att1 and att2 sites utilized in GATEWAY™, each additional mutation potentially creates a novel att site with unique specificity that will recombine only with its cognate partner att site bearing the same mutation and will not cross-react with any other mutant or wild-type att site. Novel mutated att sites (e.g., attB1-10, attP 1-10, attR 1-10 and attL1-10) are described in previous patent application Ser. No. 09/517,466, filed Mar. 2, 2000, which is specifically incorporated herein by reference. Other recombination sites having unique specificity (i.e., a first site will recombine with its corresponding site and will not recombine or not substantially recombine with a second site having a different specificity) may be used to practice the present invention. Examples of suitable recombination sites include, but are not limited to, loxP sites; loxP site mutants, variants or derivatives such as loxP511 (see U.S. Pat. No. 5,851,808);frt sites; frt site mutants, variants or derivatives; difsites; difsite mutants, variants or derivatives; psi sites; psi site mutants, variants or derivatives; cer sites; and cer site mutants, variants or derivatives. The present invention provides novel methods using such recombination sites to join or link multiple nucleic acid molecules or segments and more specifically to clone such multiple segments (e.g., two, three, four, five, seven, ten, twelve, fifteen, twenty, thirty, fifty, seventy-five, one hundred, two hundred, etc.) into one or more vectors (e.g., two, three, four, five, seven, ten, twelve, etc.) containing one or more recombination sites (e.g., two, three, four, five, seven, ten, twelve, fifteen, twenty, thirty, fifty, seventy-five, one hundred, two hundred, etc.), such as any GATEWAY™ Vector including Destination Vectors.