With the development of genetic recombination techniques, numerous target proteins are produced using animal cells, yeasts and prokaryotic systems including E. coli, and such proteins are widely used in the bioengineering industry, including medical fields. In particular, owing to its high growth rate and its relatively well identified genetic structure compared to other organisms, the bacterium E. coli is routinely used as a host cell for production of target proteins using genetic recombinant techniques.
However, E. coli has a severe disadvantage in terms of not having a variety of intracellular elements required for maturation of proteins in comparison with eukaryotic cells. In detail, post-translational modification, disulfide bond formation, glycosylation and compartmentation of proteins, which are achieved in eukaryotic cells, are not performed in E. coli. In addition, when a target protein is expressed in a large scale in E. coli, the expressed proteins frequently accumulate in the cytoplasm, forming insoluble protein aggregates referred to as inclusion bodies. Although being easily isolated and resistant to proteinase digestion, in order to obtain active proteins from the inclusion bodies, the inclusion bodies should be solubilized using a high concentration of urea or guanidium HCl to unfold proteins contained in the inclusion bodies into their primary structure, and then the resulting proteins must be refolded into biologically active conformation during or after removal of the chemical reagent. Since mechanisms associated in protein refolding are still not accurately identified, and refolding conditions vary according to proteins, finding effective refolding conditions requires much time and high cost. Because of recombinant proteins having low refolding rates, high-cost apparatuses are necessary for scaling up their industrial production, and most proteins having a high molecular weight are hard or impossible to refold, thereby creating difficulty in industrialization of such proteins.
Although biologically active proteins are stable thermodynamically, inclusion bodies are often formed during their expression in the E. coli system, formation of which is driven by intermolecular aggregation between folding intermediates during folding processes of proteins (Mitraki, A. and King, J., Bio/Technology, 7: 690-697, 1989)(Reaction Formula 1).

wherein, U is a protein in an unfolded state, F is a protein in a folded state, and I is a folding intermediate.
Typically, refolding a protein into an active form is accomplished experimentally, and is not always successfully achieved, thereby making large-scale production of a recombinant protein difficult. In addition, by the above-mentioned refolding process, it is difficult to obtain antibodies having a high molecular weight, tissue plasminogen activator and factor VIII in active forms.
To overcome the problems encountered when expressing target proteins as inclusion bodies, it is meaningful to express a target protein in a soluble form in E. coli. Until now, the following three methods have been used in effectively expressing a target protein.
First, a target protein can be obtained in a soluble form by linking a signal sequence to the N-terminus of the target protein to allow its secretion to the periplasm of E. coli (Stader, J. A. and Silhavy, T. J., Methods in Enzymol., 165: 166-187, 1970). However, such a method is not industrially available owing to low expression rate of the target protein.
Second, a target protein can be produced in a soluble form by co-expression with a chaperone gene, such as groES, groEL or dnaK genes (Goloubinoff et al., Nature, 337: 44-47, 1989). The molecular chaperones assist folding of target proteins by directly shielding of hydrophobic residues of folding intermediates (Hartl, F. U. and Hayer-Hartl, M., Science, 295: 1852-1858, 2002).
But this method is effective for specific proteins, and so is not for general use to prevent formation of inclusion bodies.
Third, a soluble target protein can be obtained by selecting a protein highly expressed in E. coli and then fusing a target protein to the C-terminus of the selected protein. Such fusion of the target protein with the C-terminus of a fusion partner protein allows effective use of translation initiation signals of the fusion partner, as well as increasing solubility of the target protein linked to the fusion partner, thereby leading to large-scale expression of the target protein in a soluble form in E. coli. 
Among the methods of the prior arts for expressing a recombinant protein in a soluble form, the most successful one is to express the recombinant protein as a fusion protein using a highly soluble protein as a fusion partner. To produce a fusion protein in E. coli, Lac Z or Trp E protein is conventionally used as a fusion partner protein. However, fusion proteins with the Lac Z or Trp E protein are mostly produced as inclusion bodies, and thus it is hard to obtain a protein of interest in an active form. In this regard, many attempts to find new fusion partner proteins have been performed. As a result, several proteins or peptides were developed as fusion partner proteins: glutathion-5-transferase (Smith, D. B. and Johnson, K. S., Gene, 67: 31-40, 1988), maltose-binding protein (Bedouelle, H. and Duplay, P., Euro. J. Biochem., 171: 541-549, 1988), protein A (Nilsson et al., Nucleic Acid Res., 13: 1151-1162, 1985), Z domain of protein A (Nilsson et al., Prot. Eng., 1: 107-113, 1987), protein Z (Nygren et al., J. Mol. Recog., 1: 69-74, 1988), and thioredoxin (Lavallie et al., Bio/Technology, 11: 187-193, 1993).
It has been reported that factors determining solubility of proteins include, in order of importance, average charge, fraction of turn-forming residues, cysteine fraction, proline fraction, hydrophilicity and total numbers of residues. And it also has been reported that average net charge and fractions of turn-forming residues are especially important (Wilkinson, D. L. and Harrison, R. G., Bio/Technology, 9: 443-448, 1991). Using the two very important parameters, model formula for solubility of a protein is defined as follows (Davis et al., Biotechnol. Bioeng., 65: 382-388, 1999):
<Model Foumula>CV=λ1((N+G+P+S)/n)+λ2|((R+K)−(D+E))/n−0.03)|
wherein, CV is a canonical variable; n is the number of amino acids in the protein; N, G, P and S are numbers of residues of asparagine (N), glycine (G), proline (P) and serine (S), respectively; R, K, D and E are numbers of residues of arginine (R), lysine (K), asparaginic acid (D), glutamic acid (E), respectively; and λ1 and λ2 are coefficients of 15.43 and −29.56, respectively. If CV−CV′ is positive, a protein is predicted to be insoluble. If CV−CV′ is negative, a protein is predicted to soluble.
In the above formula, probability of solubility or insolubility is designated as 0.4934+0.276βCV−CV′|−0.0392(CV−CV′)2, where CV′ is a discriminant number of 1.71. That is, solubility of protein is determined by average charge and folding rate, where the higher the content of turn-forming residues including Asn, Gly, Pro and Ser is, the lower the folding rate is. Using the above formula, the E. coli protein Nus A was developed as a fusion partner (Davis et al., Biotechnol. Bioeng., 65: 382-388, 1999).
As described above, among the methods of the prior arts for expressing a recombinant protein as a soluble form, the most successful one is to express the recombinant protein as a fusion protein using a protein having high solubility as a fusion partner. The conventional fusion partner proteins include maltose binding protein, thioredoxin, glutathione-5-transferase, NusA, LysN (N-terminal domain of E. coli lysine tRNA synthetase), and lysS (Korean Pat. NO: 203919). A fusion partner protein improves solubility of a target protein according to Reaction Formula 2, below.

wherein, U is an unfolded state; F is a folded state; p is a fusion partner; and t is a target protein.
As apparent in the above Reaction Formula 1, the fusion protein increases overall solubility of the target protein by stabilizing intermediates using its high soluble property.
Molecular chaperones are protein molecules known to help folding of proteins by temporarily binding to partially folded proteins and thus preventing their aggregation. Referring to the above Reaction Formula 2, a fusion partner is considered to serve as a chaperone. Because of being linked to a target protein, the fusion partner can be referred to a molecular chaperone. In the conventional concept of the molecular chaperones, a prosequence of a protein, for example, that of subtilisin, which is cleaved after assisting folding of a protein, is called a molecular chaperone (Shinde, U. and Inouye, M., J. Mol. Biol. 247(3): 390-395, 1995). There is a difference between the prosequence and the fusion partner. The former has a limitation of acting to assist folding of only one protein, while the latter helps folding of a broad range of target proteins. Also, it has been reported that ribosome or the ribosomal component 23S RNA help refolding of proteins (Das et al., Eur. J. Biochem., 235: 623-621, 1996; Chattopadhyay et al., Proc. Natl. Acad. Sci. U.S.A., 93: 8284-8287, 1996). The utility of the process is very limited, however. The in vitro refolding process still requires chemical agents such as urea or guanidium HCl for unfolding of target proteins. The chemical reagents must be diluted and removed after the refolding process, which is time-consuming and laborious. Moreover, the 23 rRNA does not provide efficient interaction with most proteins and therefore the repertoire of proteins that would be folded by this process is extremely limited.
The ability of the fusion partner proteins to exert folding of the fused target proteins may basically depend on a rapid folding rate and high average net charge. The most urgent prior problem to be solved in the post-genome era is to identify the function of proteins. To solve the above problem, proteins are first produced in a soluble active form. In this regard, development of fusion partner proteins having excellent properties is very important in basic research and industrial processes. Fusion partner proteins have been discovered by experimental experiences or an aforementioned simple method, like the discovery of NusA.