Large quantities of biologically active proteins are required for studies of protein structure-function relationships and also for the development and use of proteins in medical or industrial applications. Recombinant DNA technology enables the expression of proteins to unusually high levels in various cell types. In bacterial recombinant protein expression systems, protein over-expression is typically accomplished by cloning a nucleic acid sequence (gene or cDNA) encoding the desired protein into a suitable plasmid expression vector to form an expression construct, transforming the bacterial cells with the expression construct and culturing the transformants under conditions suitable for expression of the cloned gene.
Expression vectors are very well known in the art. Typical bacterial expression vectors are designed to contain and encode regulatory sequences, e.g., promoters, ribosome binding sites, termination signals, and the like, which provide for vigorous transcription of the cloned DNA and translation of the corresponding mRNA into the desired protein. To facilitate cloning of nucleic acid sequences encoding the protein or polypeptide of interest, expression vectors further comprise a multiple cloning site (MCS), which typically is a sequence of several unique restriction endonuclease sites present only within the MCS and nowhere else in the vectors. In expression vectors the MCS is located downstream of RNA polymerase promoter sequences and is designed so that transcribed RNAs will contain the proper signals for ribosome binding such that translation of the RNA will initiate at the proper position. Expression vectors also generally include one or more selectable marker genes (e.g., antibiotic resistance factor), so that the cells which have been successfully transformed with the expression vector can be identified and separated from those cells which have not been transformed.
One of the most powerful bacterial expression systems, with respect to the amounts of the protein of interest that can be produced, is the T7 expression system (Moffatt, B. A. and Studier, F. W. J. Mol. Biol. 189:113-130 (1986)). In this system, a gene or cDNA sequence encoding the protein or polypeptide of interest is cloned into an MCS located downstream of the T7 RNA polymerase promoter in “T7 expression vectors”, such as pET vectors (see Studier, et al., Met. Enzymology 185:60-89 (1990) and EMD Biosciences, Novagen; Stratagene; and Invitrogen product catalogs) and others, to form a recombinant T7 expression construct. Bacterial cells containing a gene for the T7 RNA polymerase (e.g., T7 expression system host strains such as BL21(DE3), EMD Biosciences, Novagen and others) are transformed with such recombinant T7 expression constructs. In the transformed cells, the T7 RNA polymerase specifically recognizes the T7 RNA polymerase promoter and rapidly generates extraordinarily large amounts of the corresponding mRNA transcript, leading to over-expression of the protein or polypeptide within the host cells. In this and similar bacterial expression systems, the desired protein or polypeptide may very quickly become the most predominant proteinaceous species in the host cell.
One of the most troublesome issues related to the expression of such large amounts of a recombinant protein is that many over-expressed proteins are unable to adopt a native, biologically-active conformation and thus become misfolded within the bacterial host cell. Generally misfolded proteins exhibit poor solubility and either accumulate in cells as insoluble aggregates (inclusion bodies) or are degraded by host cell proteases. By way of example of the generality of this problem, a recent simple search of the US patent database using Specification search terms “inclusion body” or “inclusion bodies” in combination with the terms “clone” and “expression” generated a list of 3,435 issued US patents. In most of these patents, recombinant proteins formed inclusion bodies during expression in various bacterial expression systems. The first patent in the hit list issued in February, 1987 and the most recent on the date the search was run, issued on Dec. 29, 2009. Thus, for over 22 years the failure of proteins to adopt native conformations during recombinant expression has been an issue for production of recombinant proteins. In other words, since the time recombinant protein expression became a more or less routinely practiced art, protein mis-folding and lack of solubility has been problematic in generation of recombinant proteins.
Although most recombinant proteins that misfold are those that are non-native to the expression host cell, even native bacterial proteins can misfold and form insoluble aggregates during over-expression in bacterial recombinant protein expression systems.
The expression of the protein or polypeptide of interest as a fusion protein has been proposed as a method for averting protein misfolding and inclusion body formation (see Snavely, U.S. Pat. No. 6,077,689; Mascarenhas, et al. U.S. Pat. No. 5,563,046; Lima et al. U.S. Pat. No. 6,872,551; and Harrison, et al., U.S. Pat. Nos. 5,989,868 and 6,207,420). Considerable effort has also been devoted to the development of various fusion partners to either protect the protein or polypeptide of interest from degradation by host cell proteases or to provide a facile means of purification of the protein or polypeptide of interest (reviewed by Ford, et al., (1991) Prot. Exp. and Purif. 2:95-107). It has been suggested that such fusion elements may serve both functions: enhancement of the solubility and proper folding of the recombinant protein of interest and as a means for isolation and purification of the protein.
A significant drawback of such fusion systems for enhancing the solubility of recombinant proteins in the host cells is that none of the systems has demonstrated applicability to all or even a wide variety of proteins of interest. It is common that those skilled in the art will try several fusion partners for each target protein or polypeptide of interest to find one that produces the desired outcome. Another drawback to expression of a protein of interest as a fusion product includes the use of fusion partners that are large polypeptides, which results in decreased expression overall and an absolute and relative decrease in yield of the protein or polypeptide of interest. Yet another drawback is the need to engineer a specific cleavage site into the fusion protein so that the protein or polypeptide of interest can be separated from its fusion partner. The costs of the specific agent to effect that cleavage can be prohibitive.
Accordingly, there remains a need for the development of expression methodologies to ameliorate problems associated with poor solubility and misfolding during the over-expression of proteins in high yielding protein expression systems.