Recent advances in molecular biology have made it possible to produce large amounts of heterologous proteins and polypeptides in bacterial, yeast, mammalian and other hosts. These processes rely on the construction of vectors comprising a DNA sequence coding for the desired protein or polypeptide operatively linked to expression control sequences. Suitable hosts are then transformed with these vectors to permit production of the desired product by fermentation under appropriate conditions. A further improvement of the above technology has made it possible to obtain secretion of the selected protein or polypeptide by forming a hybrid gene consisting of a DNA fragment which codes for the selected protein or polypeptide and a DNA sequence from an extracellular or periplasmic protein that is secreted.
To isolate the desired protein or polypeptide when it is not secreted from the host, the host cells must be disrupted and the protein or polypeptide isolated from other intracellular and extracellular proteins, cellular debris and other contaminants. Although a protein or polypeptide that is secreted is separated from intracellular proteins and cell debris, it must still be recovered from the culture medium or periplasmic space. Recovery of the desired protein or polypeptide in either situation generally involves a purification scheme that is time-consuming and less simple than desired. Such purification schemes also often result in loss of product or activity.
In particular, such purification schemes are generally empirical. For instance, when one of the various column separation techniques is used, all of the fractions must be assayed for the protein or polypeptide of interest. Also, many of the purification procedures are not specific, and a combination of methods must be used resulting in numerous steps. Activity and product may be lost due to the number of steps and time involved in such procedures.
One method utilized in purification schemes involves using recombinant DNA techniques to produce a fusion protein comprising the protein or polypeptide of interest linked to a reporter protein. Assay of the reporter protein is used to follow purification of the fusion protein or to provide a means of isolating the fusion protein.
Although numerous reporter proteins have been used, the paradigm of the method is fusion to .beta.-galactosidase. Beta-galactosidase fusion proteins can be purified by conventional separation techniques based on charge, size, etc., with the progress of the separation being monitored by assaying for .beta.-galactosidase activity, assaying for the ability of the fusion protein to complex with a second defective .beta.-galactosidase resulting in .beta.-galactosidase activity, or by the presence of .beta.-galactosidase antigenic determinants by reaction with anti-.beta.-galactosidase antibodies. Silhavy and Beckwith, Microbiol. Rev., 49, 398-418 (1985); Ullman and Perrin, in The Lactose Operon (Beckwith and Zipser, eds., 1970, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.). Beta-galactosidase fusion proteins can also be purified on columns of immobilized anti-.beta.-galactosidase antibodies or, if an active site is retained, on columns of an immobilized substrate analog. Silhavy and Beckwith, Microbiol. Rev., 49, 398-418 (1985); Ullman, Gene, 29, 27-31 (1984).
Fusion to reporter proteins other than .beta.-galactosidase often better facilitates purification since the reporter proteins can be chosen so that specific antibodies are not required. An example of such fusions are constructs in which the protein of interest is fused to protein A which binds to the Fc portion of IgG. Such fusions can be separated on columns of IgG. Nilsson et al., The EMBO J., 4, 1075-80 (1985).
A complication of the methods for purification of the .beta.-galactosidase and protein A fusion proteins using antibody, immunoglobulin or substrate columns is that harsh conditions are needed to disrupt the protein-protein or enzyme-substrate complexes retained on the purification columns. These conditions would be expected to at least partially denature the desired protein or polypeptide segment of the fusion protein. See Nilsson et al., The EMBO J., 4, 1075-80 (1985); Ullman, Gene, 29, 27-31 (1984); Ullman and Perrin, in The Lactose Operon (Beckwith and Zipser, eds., 1970, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.).
Biotin is a small coenzyme (vitamin H) synthesized by plants, most bacteria and some fungi, which occurs primarily in a protein-bound state within the cell. Biotinated proteins play enzymatic roles in many essential metabolic carboxylation and decarboxylation reactions. Wood and Barden, Ann. Rev. Biochem., 46, 385-413 (1977).
Biotin is bound to acceptor proteins by a covalent amide linkage between the biotin carboxyl group and a unique lysine amino group. Id. Biotin addition is a two-step reaction catalyzed by biotin ligase (also called biotin holoenzyme synthetase) (See FIG. 1). Biotin is first converted to biotinoyl-AMP which then reacts with the epsilon-amino group of the specific lysine residue of the acceptor protein to form biocytin. Biotination is a post-translation modification.
The sequences of the carboxyl terminal portions of biotin proteins from diverse biological sources show substantial homology, and biotin ligases will biotinate acceptor proteins from very different biological sources (e.g., bacteria versus higher eukaryotes). Murtif and Samols, J. Biol. Chem., 262, 11813-16 (1987); Schwarz et al., J. Biol. Chem., 263, 9640-45 (1988); McAllister and Coon, J. Biol. Chem., 241, 2855 (1966). Of particular note in these sequences are: 1) the highly conserved tetrapeptide containing the biocytin, Samols et al., J. Biol. Chem., 263, 6461-64 (1988); 2) the presence of a proline residue or short proline-rich region upstream of the biocytin, Id., Schwarz et al., J. Biol. Chem., 263, 9640-45 (1988); and 3) the fact that the lysine residues of the proteins to which biotin binds are generally located 34 or 35 residues from the carboxyl terminal amino acid, although a few biotinated proteins have the coenzyme attached at sites farther away from the carboxyl terminus, Samols et al., J. Biol. Chem., 263, 6461-64 (1988); Bai et al., Eur. J. Biochem, 182, 239 (1989); Takai et al., J. Biol. Chem., 263, 2651 (1988).
FIG. 2 shows the amino acid sequences of the carboxyl terminal portions of several biotin proteins which have been compiled from published reports. The sequences are aligned at the lysine residue that becomes biotinated (arrow). The sequences shown are: Escherichia coli biotin carboxyl carrier protein (EC BCCP, a subunit of acetyl-CoA carboxylase); the 1.3S subunit of Propionibacterium shermanii transcarboxylase (PS 3S); Saccharomyces cerevisiae pyruvate carboxylase (YPYC); human pyruvate carboxylase (HPYC); and a sequence from tomato (TOM). The identity of the protein from tomato containing the biotination site is unknown. The segment was isolated by its biotin acceptor activity and homology to the P. shermanii sequence. Hoffman et al., Nucleic Acid Research 15, 3928 (1987).
In FIG. 2, the boxed residues are those residues which are conserved among the proteins. Additional comparisons of the sequences of biotinated proteins may be found in Samols et al., J. Biol. Chem., 263, 6461-64 (1988) and Schwarz et al., J. Biol. Chem., 263, 9640-45 (1988).
Studies have been made of the roles in biotination of certain sequences and amino acids located in the carboxyl terminal portions of biotin proteins. See Murtif and Samols, J. Biol. Chem., 262, 11813-16 (1987); Samols et al., J. Biol. Chem., 263, 6461-64 (1988). In particular, the 1.3S subunit of Propionibacterium shermanii transcarboxylase has been studied. It is 123 amino acids long. Biotin is attached to a lysine residue located 34 residues from the carboxyl terminus. A truncated 1.3S subunit polypeptide containing residues 19-123 is biotinated, while deletion of the penultimate amino acid (number 122) prevents biotination of the protein. Murtif and Samols, J Biol. Chem., 262, 11813-16 (1987); Samols et al., J. Biol. Chem., 263, 6461-64 (1988). Also, the methionine residues flanking the biocytin site are not necessary for biotination. Shenoy, et al., FASEB J., 2, 2505-2511 (1988).
In addition to the covalent binding discussed above, biotin is non-covalently bound very tightly (K.sub.D 10.sup.-15 M) and specifically by the proteins avidin and streptavidin. Streptavidin fusion proteins have been developed which exploit this non-covalent binding to biotin to purify the fusion protein. In particular, PCT applications WO 87/05026 and WO 86/02077 disclose that DNA sequences that code for streptavidin have been isolated, cloned and used to prepare recombinant DNA sequences coding for fusion proteins comprising a protein or polypeptide of interest fused to streptavidin. WO 86/02077 and WO 87/05026 further teach that the fusion protein may be isolated by contacting the fusion protein with biotin or a biotin derivative or analog. Other proteins or contaminants which do not bind to biotin can be washed away, and the fusion protein eluted from the biotin.
However, the conditions described in these applications for elution of the fusion protein from biotin or biotin derivatives are extremely harsh and would cause at least partial loss of activity and antigenic properties of the protein or polypeptide of interest. Also, streptavidin fusion proteins can be extremely lethal to the host cells producing them because of their binding to intracellular biotin and metabolically essential biotinated proteins. See Sano and Cantor, Proc. Nat'l Acad. Sci. U.S.A., 87, 142-146 (1990).
Lipoylation is another post-translation modification. Lipoic acid is bound to acceptor proteins by means of a covalent amide linkage between the carboxyl group of the lipoic acid and an epsilon-amino group of a lysine residue of the protein. Stephens et al., Eur. J. Biochem., 133, 481-89 (1983). This covalent attachment is catalyzed by the enzyme lipoate ligase.
The amino acid sequences of several lipoated proteins are known, and the amino acid sequences of the lipoylation sites of these proteins are substantially homologous throughout nature (see Table I below). It has also been shown that the lipoate ligase from one bacterium can lipoate the acceptor protein from unrelated bacteria both in vitro and in vivo.
TABLE I ______________________________________ COMPARISON OF AMINO ACID SEQUENCE OF VARIOUS LIPOYLATED PROTEINS Lipoylated Protein Source Enzyme Sequence Ref. ______________________________________ + E. coli E2p* lip1 LITVEGDKASMEVP a lip2 LITVEGDKASMEVP a lip3 LITVEGDKASMEVP a E2o** LVEIETDKVVLEVP b B. stearo- E2p LCEVQNDKAVVEIP c thermo- philus A. vinelandii E2p lip1 LVVLESAKASMEVP d lip2 LIVLESDKASMEIP d lip3 LIVLESDKASMEIP d E2o LIVDLETDKVVMEVL e Bovine E2p VETDKATVGF f Rat E2p IETDKATIGFE g Human E2p lip1 VETDKATVGFE h lip2 IETDKATIGFE h Chicken Glycine LESVKAASEL i cleavage ______________________________________ + indicates lipoyllysine residue *E2p = dihydrolipoamide acetyltransferase from pyruvate dehydrogenase **E2o = dihydrolipoamide succinyltransferase from alphaketoglutarate dehydrogenase a Stephens, Darlison, Lewis and Guest, Eur. J. Biochem., 133, 155-162 (1983). b Spencer, Darlison, Stephens, Duckenfield and Guest, Eur. J. Biochem., 141, 361-374 (1984). c Packman, Borges and Perham, Biochem. J., 252, 79-86 (1988). d Hanemaaijer, Janssen, Kok and Veeger, Eur. J. Biochem, 174, 593-599 (1988). e Westphal and Kok, Eur. J. Biochem., 187, 235-239 (1990). f Bradford, Howell, Aitken. James and Yeaman, Biochem J., 245, 919-922 (1987). g Gershwin, Mackay, Sturgess and Coppel, J. Immunol., 138, 3525-3531 (1987). h Coppel, McNeilage, Surh, VandeWater, Spithill, Whittingham and Gershwin Proc. Natl. Acad. Sci. USA, 85, 7317-7321 (1988). i Fujiwara, OkamuraIkeda and Motokawa, J. Biol. Chem., 261, 8836-8841 (1986).
The dihydrolipoamide acetyltransferase (E2p) component of the pyruvate dehydrogenase complex of E. coli contains three highly homologous sequences of about 100 amino acids each that are tandemly repeated to form the N-terminal half of the polypeptide chain. Id.; Guest et al., J. Mol. Biol., 185, 743-54 (1985). All three of these sequences include a lysine that is a site for lipoylation, and the three sequences appear to form independently folded functional domains. Id. Each repeated sequence contains the lipoylation site in an invariant eighteen-residue sequence which is: ##STR1## Id.; Stephens et al., Eur. J. Biochem., 133, 481-89 (1983). The three repeating sequences of E2p also contain lengthy C-terminal regions of about 20 to 30 amino acids that are unusually rich in alanine, proline and charged amino acids, and these regions provide conformational flexibility to the polypeptide. Radford et al., J. Biol. Chem., 264, 767-75 (1989); Guest et al., J. Mol. Biol., 185, 743-54 (1985).