Inteins are internal protein elements that self-excise from their host protein and catalyze ligation of the flanking sequences (exteins) with a peptide bond. Intein excision is a posttranslational process that does not require auxiliary enzymes or cofactors. This self-excision process is called “protein splicing,” by analogy to the splicing of RNA introns from pre-mRNA (Perler F et al., Nucl Acids Res. 22:1125-1127 (1994)). The segments are called “intein” for internal protein sequence, and “extein” for external protein sequence, with upstream exteins termed “N-exteins” and downstream exteins called “C-exteins.” The products of the protein splicing process are two stable proteins: the mature protein and the intein.
Structure of Mini-inteins and Large Inteins
Inteins are classified into two groups: large and minimal (mini) (Liu XQ, Ann Rev Genet 34:61-76 (2000)). Large inteins contain a homing endonuclease domain that is absent in mini-inteins. Splicing-efficient mini-inteins have been engineered from large inteins by deleting the central endonuclease domain, demonstrating that the endonuclease domain is not involved in protein splicing (Chong S. and Xu M., J Biol Chem. 272:15587-15589 (1997); Derbyshire V. et al., Proc Natl Acad Sci USA. 94:11466-11471 (1997); and Shingledecker K. et al. Gene. 207:187-195 (1998)).
All known inteins share a low degree of sequence similarity, with conserved residues only at the N- and C-termini. Most inteins begin with Ser or Cys and end in His-Asn or in His-Gln. The first amino acid of the C-extein is an invariant Ser, Thr, or Cys, but the residue preceding the intein at the N-extein is not conserved (Perler F. 2002, Nucl. Acids Res. 30: 383-384). However, residues proximal to the intein-splicing junction at both the N- and C-terminal exteins were recently found to accelerate or attenuate protein splicing (Amitai G et al. 2009, Proc. Natl. Acad. Sci. USA. 106:11005-11010).
Cis- and Trans-splicing Mechanisms of Inteins
The inteins can be classified by their splicing mechanism. Class 1 inteins, which are the most studied group of inteins, have a rapid process of four nucleophilic attacks, mediated by three of the four conserved splice junction residues. In step 1, the splicing process begins with an acyl-shift of the serine or cysteine residue located at the first position of the N-terminal splicing domain. This forms a (thio)ester bond at the N-extein/intein junction. In step 2, the (thio)ester bond is attacked by the OH- or SH-group of the first residue in the C-extein (Cys, Ser, or Thr). This leads to a transesterification, which transfers the N-extein to the side-chain of the first residue of the C-extein. In step 3, the cyclization of the conserved Asn or Gln residue located at the last position of the C-terminal splicing domain links the exteins by a (thio)ester bond. Finally, step 4 is a rearrangement of the (thio)ester bond to a peptide bond by a spontaneous S—N or O—N acyl shift. The important amino acids involved directly or indirectly in the splicing reaction are shown in FIG. 3A.
Site-specific cleavage of the inteinextein junctions in class 1 inteins can be achieved by mutation of the conserved intein residues. Mutation of the Asn or Gln residue at the intein C-terminus abolishes steps 3 and 4 of the splicing reaction and results in N-terminal cleavage only. Since step 1 still occurs, the (thio)ester bond can spontaneously hydrolyze, separating the N-extein from the intein/C-extein portion. The serine or cysteine residue located at the first position of the N-terminal splicing domain is required for N-terminal cleavage (see FIG. 3C). Mutation of this conserved first residue of the intein abolishes steps 1, 2, and 4 of the splicing reaction and leads to C-terminal cleavage only. In such a mutated intein, Asn cyclization (step 3) still occurs, to separate the C-extein from the N-extein/intein portion. The Asn (or Gln), and the His residues located respectively at the last (XN) and penultimate (XN−1) positions of the C-terminal splicing domain are required for N-terminal cleavage (see FIG. 3B). Controllable cleavage of modified cis-splicing inteins has been adapted for a wide range of useful applications in molecular biology and biotechnology.
Natural Split Inteins
Inteins can also exist as two fragments encoded by two separately transcribed and translated genes. These so-called split inteins self-associate and catalyze protein-splicing activity in trans.
Split inteins have been identified in diverse cyanobacteria and archaea (Caspi et al., Mol Microbiol. 50:1569-1577 (2003); Choi J. et al., J Mol Biol. 356:1093-1106 (2006.); Dassa B. et al., Biochemistry. 46:322-330 (2007.); Liu X. and Yang J., J Biol Chem. 278:26315-26318 (2003); Wu H. et al., Proc Natl Acad Sci USA. 95:9226-9231 (1998.); and Zettler J. et al., FEBS Letters. 583:909-914 (2009)), but have not been found in eukaryotes thus far. Recently, a bioinformatic analysis of environmental metagenomic data revealed 26 different loci with a novel genomic arrangement. At each locus, a conserved enzyme coding region is interrupted by a split intein, with a free-standing endonuclease gene inserted between the sections coding for intein subdomains. Among them, five loci were completely assembled: DNA helicases (gp41-1, gp41-8); Inosine-5′-monophosphate dehydrogenase (IMPDH-1); and Ribonucleotide reductase catalytic subunits (NrdA-2 and NrdJ-1). This fractured gene organization appears to be present mainly in phages (Dassa et al. Nucleic Acids Research. 37:2560-2573 (2009)).
The split intein Npu DnaE was characterized as having the highest rate reported for the protein trans-splicing reaction. In addition, the Npu DnaE protein splicing reaction is considered robust and high-yielding with respect to different extein sequences, temperatures from 6 to 37° C., and the presence of up to 6 M Urea (Zettler J. et al., FEBS Letters. 583:909-914 (2009); Iwai I. et al., FEBS Letters 580:1853-1858 (2006)). As expected, when the Cysl Ala mutation at the N-domain of these inteins was introduced, the initial N to S-acyl shift and therefore protein splicing was blocked. Unfortunately, the C-terminal cleavage reaction was also almost completely inhibited. The dependence of the asparagine cyclization at the C-terminal splice junction on the acyl shift at the N-terminal scissile peptide bond seems to be a unique property common to the naturally split DnaE intein alleles (Zettler J. et al. FEBS Letters. 583:909-914 (2009)).
Applications of Inteins in Biotechnology
Inteins are valuable tools in a wide range of biotechnological applications The ligation of peptides and proteins using the natural splicing activity of inteins is known as intein-mediated protein ligation (IPL), or expressed protein ligation (EPL), and is well established in molecular biology and biotechnology methods (Evans T. et al., Biopolymers 51:333-342 (1999); Muir T. et al., Proc Natl Acad Sci USA. 95:6705-6710 (1998); and Severinov K. and Muir T., J Biol Chem. 273:16205-16209 (1998)). Furthermore, inteins have been used for protein purification by site-specific cleavage only at the intein-target protein border (Lu W. et al, J Chromatography A. 1218:2553-2560 (2011)). The use of intein-mediated procedures in bioseparation is well established at the laboratory scale and is attracting increasing interest in large-scale biotechnology. The potential of these protein purification techniques for large-scale protein production is clear, but intein-mediated protein purification systems under industrial, scaled-up conditions must be developed. Other applications are segmental labeling of proteins for NMR analysis, cyclization of proteins, controlled expression of toxic proteins, conjugation of quantum dots to proteins and incorporation of non-canonical amino acids, (Arnold U., Biotechnol Lett. 31:1129-1139 (2009); Charalambous A. et al., J Nanobiotechnology 7:9 (2009); Oeemig J. et al., FEBS Letters 583:1451-1456 (2009); Seyedsayamdost M. et al., Nat Protoc. 2:1225-1235 (2007); Züger S. and Iwai H., Nat Biotechnol. 23:736-740 (2005); and Evans T. et al., Annu Rev Plant Biol. 56:375-392 (2005)). In basic research studies, inteins have been used to monitor in vivo protein—protein interactions, specifically translocation of proteins into cellular organelles, ligation of exogenous polypeptide to membrane proteins on living cells or photocontrol of protein activity (Chong S. and Xu M., Homing endonucleases and inteins. Vol 16. Springer, Berlin Heidelberg, New York, 273-292 (2005); Ozawa T. and Umezawa Y., Homing endonucleases and inteins. Vol 16. Springer, Berlin Heidelberg, New York, 307-323 (2005); Ozawa T. et al., Nat Biotechnol. 21:287-293 (2003); Dhar T. and Mootz H., Chem Commun. 47:3063-3065 (2011); and Binschik J. et al., Angewandte Chemie International Ed. 50(14):3249-3252 (2011)). Most of the inteins used in biotechnology are derived from prokaryotic organisms, or are engineered variants of the S. cerevisiae VMA1 -intein (Elleuche & Pöggeler 2010 Appl. Microbiol Biotechnol 78:479-489).
In order to make use of such techniques in large-scale biological processes, inteins with robust properties and methods of using the same must be identified. The inteins and methods of using such inteins that are described herein address this need by providing highly active inteins that function in a large temperature range, in the presence of salts, and when fused to polypeptides of variable sequences.