Several publications and patent documents are cited throughout the specification in order to describe the state of the art to which this invention pertains. Full citations of these references can be found throughout the specification. Each of these citations is incorporated herein by reference as though set forth in full.
Functional genomic studies have been hampered by the inability to uniformly express and purify biologically active proteins in heterologous expression systems (Ryan and Patterson (2002) Trends Biotechnol, 20:S45-51). Despite the use of identical transcriptional and translational signals in a given expression vector, expressed protein levels have been observed to vary dramatically (Weickert et al. (1996) Curr. Opin. Biotechnol., 7:494-9). For this reason, several strategies have been developed to express heterologous proteins in bacteria, yeast, mammalian and insect cells as gene-fusions (Ecker et al. (1989) J. Biol. Chem., 264:7715-9; Butt et al. (1989) Proc. Natl. Acad. Sci., 86:2540-4; Kapust and Waugh (1999) Protein Sci., 8:1668-74; Ikonomou et al. (2003) Appl. Microbiol. Biotechnol., 62:1-20).
The expression of heterologous genes in bacteria is by far the simplest and most inexpensive means available for research or commercial purposes. However, some heterologous gene products fail to attain their correct three-dimensional conformation in E. coli while others become sequestered in large insoluble aggregates or “inclusion bodies” when overproduced (Jonasson et al. (2002) Biotechnol. Appl. Biochem., 35:91-105; Georgiou and Valax (1999) Methods Enzymol., 309:48-58.). Major denaturant-induced solubilization methods followed by removal of the denaturant under conditions that favor refolding are often required to produce a reasonable yield of the recombinant protein.
Selection of open reading frames (ORFs) for structural genomics projects has also shown that only about 20% of the genes expressed in E. coli render proteins that are soluble or correctly folded (Waldo et al. (1999) Nat. Biotechnol., 17:691-5). These numbers are startlingly disappointing especially given that most scientists rely on E. coli for initial attempts to express gene products. Several systems for expressing proteins by conjugation to a tag such as NUS A, maltose binding protein (MBP), glutathione S transferase (GST), and thioredoxin (TRX) have been developed (Jonasson et al. (2002) Biotechnol. Appl. Biochem., 35:91-105). All of these systems have certain drawbacks, ranging from inefficient expression to inconsistent cleavage from desired structure.
Ubiquitin (Ub) and ubiquitin like proteins (Ubls) have been described in the literature (Jentsch and Pyrowolakis (2000) Trends Cell Biol., 10:335-42; Yeh et al. (2000) Gene, 248:1-14; Larsen and Wang (2002) J. Proteome Res., 1:411-9). The SUMO system has also been characterized (Muller et al. (2001) Nat. Rev. Mol. Cell. Biol., 2:202-10.). SUMO (small ubiquitin related modifier) is a Ubl that is also known as Sentrin, SMT3, PIC1, GMP1 and UBL1 in published literature. The SUMO pathway is present throughout the eukaryotic kingdom and SUMO proteins are highly conserved ranging from yeast to humans (Kim et al. (2002) J. Cell. Physiol., 191:257-68). Although overall sequence homology between ubiquitin and SUMO is only 18%, structure determination by nuclear magnetic resonance (NMR) reveals that the two proteins possess a common three dimensional structure characterized by a tightly packed globular fold with n-sheets wrapped around one α-helix (Bayer et al. (1998) J. Mol. Biol., 280:275-86; Kim et al. (2000) J. Biol. Chem., 275:14102-6). Examining the chaperoning properties of SUMO reveals that its attachment to the N-terminus of a labile protein can act as a nucleus for folding and protect the protein from aggregation.
All SUMO genes encode precursor proteins with a short C-terminal sequence that extends beyond the conserved C-terminal Gly-Gly motif (Muller et al. (2001) Nat. Rev. Mol. Cell. Biol., 2:202-10). The extension sequence varies in length and is typically 2-12 amino acids. SUMO proteases (known also as hydrolases) remove the C-terminal extensions prior to sumoylation in the cell (Coloma et al. (1992) J. Immunol. Methods, 152:89-104). Conjugating the C-terminus of SUMO to the ε-amino groups of lysine residues of a target protein is known as sumoylation. Sumoylation of cellular proteins has been proposed to regulate nuclear transport, signal transduction, stress response, and cell cycle progression (Kretz-Remy and Tanguay (1999) Biochem. Cell. Biol., 77:299-309). It is very likely that SUMO signals the translocation of proteins among various cell compartments, however, the precise mechanistic details of this function of SUMO are not known. The similarity between the SUMO pathway and the ubiquitin pathway is remarkable, given the different effects that these two protein modifications permit (Goettsch and Bayer (2002) Front. Biosci., 7:a148-62).
NusA is another fusion tag that promotes solubility of partner proteins presumably due to its large size (Davis et al. (1999) Biotecnol. Bioeng., 65:382-8). Glutathione S-transferase (GST) (Smith and Johnson (1988) Gene, 67:31-40) and maltose binding protein (MBP) (diGuan et al. (1988) Gene, 67:21-30) fusion tags have been proposed to enhance expression and yield of fusion partners as well. However, enhanced expression is not always observed when GST is used as it forms dimers and can retard protein solubility. Another problem with all of these fusion systems is that the desired protein may have to be removed from the fusion. To circumvent this problem, protease sites, such as Factor Xa, thrombin, enterokinase or Tev protease sites are often engineered downstream of the fusion tag. However, inappropriate cleavage is often observed because these proteases recognize a short specific amino acid sequence that might be present within the fusion/target protein (Jonasson et al. (2002) Biotechnol. Appl. Biochem., 35:91-105). The present invention circumvents these problems. Further, unlike SUMO proteases, Tev protease is a sequence specific protease that leaves undesirable sequence at the N-terminus of the protein of interest after cleavage of a fusion protein. In contrast, SUMO proteases cleave any sequence from the C-terminus of SUMO to generate desired N-termini in the fused protein (except for proline).