Many peptides and proteins can be produced via recombinant means in a variety of expression systems, e.g., various strains of bacterial, fungal, mammalian or insect cells. However, when bacteria are used as host cells for heterologous gene expression, several problems frequently occur.
For example, heterologous genes encoding small peptides are often poorly expressed in bacteria. Because of their size, most small peptides are unable to adopt stable, soluble conformations and are subject to intracellular degradation by proteases and peptidases present in the host cell. Those small peptides which do manage to accumulate when directly produced in E. coli or other bacterial hosts are usually found in the insoluble or "inclusion body" fraction, an occurrence which renders them almost useless for screening purposes in biological or biochemical assays.
Moreover, even if small peptides are not produced in inclusion bodies, their production by recombinant means as candidates for new drugs or enzyme inhibitors encounters further problems. Even small peptides can adopt an enormous number of potential structures due to their high degree of conformational freedom. Thus a small peptide can have the "desired" amino-acid sequence and yet have very low activity in an assay because the "active" peptide conformation is only one of the many alternative structures it can adopt in free solution. This presents another difficulty encountered in producing small heterologous peptides recombinantly for effective research and therapeutic use.
Inclusion body formation is also frequently observed when the genes for heterologous proteins are expressed in bacterial cells. These inclusion bodies usually require further manipulations in order to solubilize and refold the heterologous protein, with conditions determined empirically and with uncertainty in each case.
If these additional procedures are not successful, little to no protein retaining bioactivity can be recovered from the host cells. Moreover, these additional processes are often technically difficult and prohibitively expensive for practical production of recombinant proteins for therapeutic, diagnostic or other research uses.
To overcome these problems, the art has employed certain peptides or proteins as fusion "partners" with a desired heterologous peptide or protein, to enable the recombinant production and/or secretion of small peptides or larger proteins as fusion proteins in bacterial expression systems. Among such fusion partners are included LacZ and TrpE proteins, maltose-binding protein and glutathione-S-transferase[See, generally, Current Protocols in Molecular Biology, Vol. 2, suppl. 10, publ. John Wiley and Sons, New York, N.Y., pp. 16.4.1-16.8.1 (1990); and Smith et al., Gene 67:31-40 (1988)]. As another example, U.S. Pat. No. 4,801,536 describes the fusion of a bacterial flagellin gene to a desired gene to enable the production of a heterologous protein in a bacterial cell and its secretion into the culture medium as a fusion protein.
However, often fusions of desired peptides or proteins to other proteins (i.e. fusion partners) at the amino- or carboxyl-termini of these fusion partner proteins have other potential disadvantages. Experience in E. coli has shown that a crucial factor in obtaining high levels of gene expression is the efficiency of translation initiation. Translation initiation in E. coli is very sensitive to the nucleotide sequence surrounding the initiating methionine codon of the desired heterologous peptide or protein sequence, although the rules governing this phenomenon are not clear. For this reason, fusions of sequences at the amino-terminus of many fusion partner proteins can affect expression levels in an unpredictable manner. In addition there are numerous amino- and carboxy-peptidases in E. coli which degrade amino- or carboxyl-terminal peptide extensions to fusion partner proteins so that a number of the known fusion partners have a low success rate for producing stable fusion proteins.
The purification of proteins produced by recombinant expression systems is often a serious challenge. Certain purification schemes, e.g., such as that disclosed in Haymore et al., U.S. Pat. No. 5,115,102 (filed Jul. 21, 1989, issued May 19, 1992), require the introduction of metal-chelating amino acid sequences into the protein of interest at a position dictated by the secondary structure of that protein, e.g., by locating .alpha.-helix, .beta.-strand, and .beta.-hairpin regions in the protein's structure, and by introducing two selected histidine residues separated by 3, 2 or 1 amino acid residues, respectively, into one of these regions. The modifications confer an affinity on the protein for metal-chelate columns which can be used as a purification tool. Unfortunately the introduction of such modifications as taught by the method can destroy the biological activity of the protein of interest, e.g., particularly where the substitution is a non-conservative change which can alter a ligand binding site, an active site, or other functional sites, and/or destroy important tertiary structural relationships in the protein. Moreover, certain of the introduced changes could result in mis-folding of the protein of interest. It is important to give consideration to the location of these vital protein features when making such modifications. Since the Haymore et al. approach teaches that metal chelating amino acids must be positioned very precisely with respect to each other within the same element of secondary structure, there are a limited number of places in any one protein that can be so modified, and this number of potential metal-chelating sites diminishes when the important functional regions of the protein are excluded. For instance, for those proteins having .alpha.-helical region(s) which are limited to the active site or to the receptor binding site of the molecule, it would not be possible to successfully utilize that region for modification while retaining functionality. Furthermore only chelating sites formed by residues positioned close together in the primary sequence were considered in this method; and no allowance was given to the possibility of generating metal-chelating sites using residues positioned in the primary sequence further apart than 9 residues. However it is possible that two residues distant from each other in the primary sequence of a protein could, in fact, be adjacent in the folded tertiary structure, and thus could potentially be suitable places for the introduction of metal-chelating amino-acids.
Thus although there is a continuing need for new and easier methods to produce homogeneous preparations of recombinant proteins for use in research, diagnostic and therapeutic applications, there are many problems, such as those outlined above, in modifying the sequence of the desired proteins for purification purposes. These problems can be avoided by utilizing a fusion protein approach in which the fusion partner protein has the ability to bind to an affinity matrix, and the desired protein is left unaltered. Many fusion partners currently used in the art possess no inherent properties that would facilitate purification. However, the present invention provides, inter alia, the modification of a fusion partner protein, e.g., thioredoxin, in such a way as to enable it to bind to a metal chelate affinity matrix, providing an additional convenient purification tool that can be used for fusion proteins. The technique is also applicable to other proteins, including other fusion partner proteins, and proteins which are not fusion protein constructs.