Obtaining sufficient amounts of soluble, well-folded recombinant proteins for downstream applications remains a significant bottleneck in many fields that apply protein expression technologies (Makrides 1996; Baneyx 1999; Fahnert, Lilie et al. 2004), including structural genomics projects (Yokoyama 2003; Goh, Lan et al. 2004; Terwilliger 2004). Current approaches for maximizing soluble protein include screening large numbers of protein variants (mutants, fragments, fusion tags, folding partners), and testing many expression or refolding conditions (Armstrong, de Lencastre et al. 1999; Fahnert, Lilie et al. 2004). Several methods have recently been developed to screen proteins for soluble expression (Waldo 2003), including antibody detection of polyhistidine-tagged proteins in dot-blots (Knaust and Nordlund 2001), but these approaches require multiple steps and do not work in vivo. Proteins tagged with the lacZα fragment can be detected after structural complementation with lacZΩ (Ullmann, Jacob et al. 1967; Nixon and Benkovic 2000; Wigley, Stidham et al. 2001; Wehrman, Kleaveland et al. 2002), but the lacZα fragment is relatively large (52 amino acids) (Wigley, Stidham et al. 2001), and there have been no detailed studies regarding its effects on fusion partner folding and solubility. Proteins tagged with the 15 amino acid S-peptide (Kim and Raines 1993) can be quantified in vitro using a sensitive fluorogenic substrate (Kelemen, Klink et al. 1999) (FretWorks®, Novagen, Madison, Wis.) after complementation with the S-protein (Richards and Vithayathil 1959), but the assay cannot be used to assess soluble protein expression in vivo in E. coli. 
GFP and its numerous related fluorescent proteins are now in widespread use as protein tagging agents (for review, see Verkhusha et al., 2003, GFP-like fluorescent proteins and chromoproteins of the class Anthozoa. In: Protein Structures: Kaleidescope of Structural Properties and Functions, Ch. 18, pp. 405-439, Research Signpost, Kerala, India). In addition, GFP has been used as a solubility reporter of terminally fused test proteins (Waldo et al., 1999, Nat. Biotechnol. 17:691-695; U.S. Pat. No. 6,448,087, entitled ‘Method for Determining and Modifying Protein/Peptide Solubility’). GFP-like proteins are an expanding family of homologous, 25-30 kDa polypeptides sharing a conserved 11 beta-strand “barrel” structure. The GFP-like protein family currently comprises some 100 members, cloned from various Anthozoa and Hydrozoa species, and includes red, yellow and green fluorescent proteins and a variety of non-fluorescent chromoproteins (Verkhusha et al., supra). A wide variety of fluorescent protein labeling assays and kits are commercially available, encompassing a broad spectrum of GFP spectral variants and GFP-like fluorescent proteins, including DsRed and other red fluorescent proteins (Clontech, Palo Alto, Calif.; Amersham, Piscataway, N.J.).
Various strategies for improving the solubility of GFP and related proteins have been documented, and have resulted in the generation of numerous mutants having improved folding, solubility and perturbation tolerance characteristics. Stemmer and coworkers applied directed evolution to screen for mutants or variants of GFP that exhibited increased fluorescence and folding yield in E. coli (see, e.g., Crameri et al., Nat. Biotechnol. 143:315-319, 1996). They identified a mutant that exhibited increased folding ability. This version of GFP, termed cycle-3 or GFP3 contains the mutations F99S, M153T and V163A. GFP3 is relatively insensitive to the expression environment and folds well in a wide variety of hosts, including E. coli. GFP3 folds equally well at 27° C. and 37° C. Thus, the GFP3 mutations also appear to eliminate potential temperature sensitive folding intermediates that occur during folding of wild type GFP.
GFP3 can be made to misfold by expression as a fusion protein with another poorly folded polypeptide. GFP3 has been used to report on the “folding robustness” of N-terminally fused proteins during expression in E. coli (Waldo et al., 1999, supra). In this method, the sequence of the reporter, e.g., GFP3 domain, remains constant and a poorly folded upstream domain is mutated. Better folded variants of domain X are identified by increased fluorescence.
Existing protein tagging and detection platforms are powerful but have drawbacks. Split protein tags can perturb protein solubility (Ullmann, Jacob et al. 1967; Nixon and Benkovic 2000; Fox, Kapust et al. 2001; Wigley, Stidham et al. 2001; Wehrman, Kleaveland et al. 2002) or may not work in living cells (Richards and Vithayathil 1959; Kim and Raines 1993; Kelemen, Klink et al. 1999). Green fluorescent protein fusions can misfold (Waldo, Standish et al. 1999) or exhibit altered processing (Bertens, Heijne et al. 2003). Fluorogenic biarsenical FLaSH or ReASH (Adams, Campbell et al. 2002) substrates overcome many of these limitations, but require a polycysteine tag motif, a reducing environment, and cell transfection or permeabilization (Adams, Campbell et al. 2002).
GFP fragment reconstitution systems have been described, mainly for detecting protein-protein interactions, but none are capable of unassisted self-assembly into a correctly-folded, soluble and fluorescent re-constituted GFP, and no general split GFP folding reporter system has emerged from these approaches. For example, Ghosh et al, 2000, reported that two GFP fragments, corresponding to amino acids 1-157 and 158-238 of the GFP structure, could be reconstituted to yield a fluorescent product, in vitro or by coexpression in E. coli, when the individual fragments were fused to coiled-coil sequences capable of forming an antiparallel leucine zipper (Ghosh et al., 2000, Antiparallel leucine zipper-directed protein reassembly: application to the green fluorescent protein. J. Am. Chem. Soc. 122: 5658-5659). Likewise, U.S. Pat. No. 6,780,599 describes the use of helical coils capable of forming anti-parallel leucine zippers to join split fragments of the GFP molecule. The patent specification establishes that reconstitution does not occur in the absence of complementary helical coils attached to the GFP fragments. In particular, the specification notes that control experiments in which GFP fragments without leucine zipper pairs “failed to show any green colonies, thus emphasizing the requirement for the presence of both NZ and CZ leucine zippers to mediate GFP assembly in vivo and in vitro.”
Similarly, Hu et al., 2002, showed that the interacting proteins bZIP and Rel, when fused to two fragments of GFP, can mediate GFP reconstitution by their interaction (Hu et al., 2002, Visualization of interactions among bZIP and Rel family proteins in living cells using bimolecular fluorescence complementation. Mol. Cell 9: 789-798). Nagai et al., 2001, showed that fragments of yellow fluorescent protein (YFP) fused to calmodulin and M13 could mediate the reconstitution of YFP in the presence of calcium (Nagai et al., 2001, Circularly permuted green fluorescent proteins engineered to sense Ca2+. Proc. Natl. Acad. Sci. USA 98: 3197-3202). In a variation of this approach, Ozawa at al. fused calmodulin and M13 to two GFP fragments via self-splicing intein polypeptide sequences, thereby mediating the covalent reconstitution of the GFP fragments in the presence of calcium (Ozawa et al., 2001, A fluorescent indicator for detecting protein-protein interactions in vivo based on protein splicing. Anal. Chem. 72: 5151-5157; Ozawa et al., 2002, Protein splicing-based reconstitution of split green fluorescent protein for monitoring protein-protein interactions in bacteria: improved sensitivity and reduced screening time. Anal. Chem. 73: 5866-5874). One of these investigators subsequently reported application of this splicing-based GFP reconstitution system to cultured mammalian cells (Umezawa, 2003, Chem. Rec. 3: 22-28). More recently, Zhang et al., 2004, showed that the helical coil split GFP system of Ghosh et al., 2000, supra, could be used to reconstitute GFP (as well as YFP and CFP) fluorescence when coexpressed in C. elegans, and demonstrated the utility of this system in confirming coexpression in vivo (Zhang et al., 2004, Combinatorial marking of cells and organelles with reconstituted fluorescent proteins. Cell 119: 137-144).
Although the aforementioned GFP reconstitution systems provide advantages over the use of two spectrally distinct fluorescent protein tags, they are limited by the size of the fragments and correspondingly poor folding characteristics (Ghosh et al., Hu et al., supra), the requirement for a chemical ligation or fused interacting partner polypeptides to force reconstitution (Ghosh et al., 2000, supra; Ozawa et al., 2001, 2002 supra; Zhang et al., 2004, supra), and co-expression or co-refolding to produce detectable folded and fluorescent GFP (Ghosh et al., 2000; Hu et al., 2001, supra). Poor folding characteristics limit the use of these fragments to applications wherein the fragments are simultaneously expressed or simultaneously refolded together. Such fragments are not useful for in vitro assays requiring the long-term stability and solubility of the respective fragments prior to complementation. An example of an application for which such split protein fragments are not useful would be the quantification of polypeptides tagged with one member of the split protein pair, and subsequently detected by the addition of the complementary fragment.
An ideal protein tag would be genetically encoded, could work both in vivo and in vitro, provide a sensitive analytical signal, and would not require external chemical reagents or substrates. However, to date, a split fluorescent protein tagging system that does not rely upon the use of fused heterologous polypeptide domains to drive reconstitution of the fluorescent reporter activity has not been described. A split-fluorescent protein tagging system in which the fragments are capable of spontaneously self-associating without the need for fused interacting protein domains, remain soluble prior to association, and does not change the solubility of fused target proteins is needed and is addressed by this invention.