The present invention relates to methods of producing in optimal quantities soluble fusion proteins which comprise a heterologous polypeptide which is normally insoluble and/or suboptimally expressed when expressed in a cell. More particularly, the present invention relates to soluble fusion proteins which comprise an optimally large heterologous polypeptide, such as a membrane protein, fused to an optimally small soluble carrier polypeptide, which can be expressed by recombinant host cells in amounts and with a solubility, purifiability and stability under crystallization conditions enabling their high-grade crystallization. The present invention further relates to polynucleotides encoding such fusion proteins, to expression vectors for expression of such fusion proteins, to cloning vectors for generating such expression vectors, to kits which comprise such cloning vectors, to host cells transformed with such polynucleotides/vectors, and to methods of generating such fusion proteins.
The capacity to produce polypeptides in a soluble form and quantity enabling their high-grade (highly ordered and homogeneous) crystallization enables solution of their 3D atomic structure via X-ray crystallography. Such crystallography is proving to be crucial for understanding and regulating the biological functions of polypeptides, and, as such, is playing an increasingly vital role in the advancement of biomedical science and biotechnology, in particular in the realm of drug design. For example, computationally assisted drug design/identification based on the solved X-ray crystallographic structures of key proteins involved in disease pathogenesis has been successfully used to design critical breakthrough drugs such as HIV-1 protease inhibitors for treating AIDS (Wlodawer A. and Vondrasek J., 1998. Annu Rev Biophys Biomol Struct. 27:249), tyrosine kinase inhibitors for treating leukemia (Wong S. and Witte ON., 2004. Annu Rev Immunol. 22:247306), and influenza virus neuraminidase inhibitors for treating influenza (Wilson J C. and von Itzstein M., 2003. Curr Drug Targets. 4:389-408). Further industrial applications of high-grade polypeptide crystals include their use as catalysts on a commercial scale, in bioremediation and green chemistry applications, purification-related applications, such as enantioselective chromatography of pharmaceuticals and high-grade chemicals, and development of adjuvant-less vaccines (Margolin A L. and Navia M A., 2001. Angewandte Chemie International Edition 40:2204).
Although polypeptide crystals are clearly tremendously and uniquely useful, their crystallization generally remains highly challenging, in particular in the case of heterologous polypeptides, such as membrane proteins, which are normally insoluble and/or suboptimally expressed when expressed in a cell. The difficulty in crystallizing membrane proteins and determining their 3D structures via X-ray diffraction is amply demonstrated by the fact that out of 28,000 high resolution protein structures solved to date, a mere 88 are of known membrane proteins. So far, only 4 heterologous recombinant mammalian membrane proteins have been crystallized and their 3D structure solved. These include mouse cyclooxygenase-2 overexpressed in a baculovirus/insect cell system (Kurumbail, R. G. et al., 1996. Nature 384:644-648); monoamine oxidase B, a mitochondrial membrane protein which includes alpha-helices anchored to the membrane in its carboxyl terminus, overexpressed in yeast (Pichia pastoris; Binda, C. et al., 2002. Nature Struct. Biol. 9:22-6); and fatty acid amide hydrolase (FAAH) expressed in E. coli (Bracey et al. 2002). Cyclooxygenase-2 and monoamine oxidase B are monotopic membranal proteins which cross only one section of the membrane lipid bilayer (monotopic proteins). The fourth heterologous mammalian membrane protein crystallized is the potassium channel-Kv1.2 which is a transmembrane protein. The channel was overexpressed in yeast Pichia pastoris (Long et at.2005).
In general, techniques for growing polypeptide crystals currently rely substantially on empirical processes for which only general rules of thumb are available and which frequently require adaptations tailored to accommodate the peculiarities of individual polypeptides. Several factors contribute to the difficulty in obtaining high-grade polypeptide crystals. Although contacts between crystallized polypeptide molecules are of comparable energy to those between small molecules, the significantly fewer number of intermolecular contacts per molecular weight of crystallized polypeptide molecules renders these contacts very fragile (Carugo O. and Argos P., 1997. Protein Science 6:2261). Furthermore, due to their inherent complexity, polypeptide molecules can assume numerous conformations, a phenomenon which tends to prevent formation of highly ordered crystals. Moreover, aggregated polypeptides are able to form many different types of intermolecular contacts of which only a restricted number will generate highly ordered crystals. Hence, crystallization conditions must be carefully fine-tuned so as to induce the proper molecular conformation and packing orientation of each molecule accreted during the process of crystallization. Such conditions are difficult to obtain since small variations in physico-chemical parameters, such as pH, ionic strength, temperature or contaminants, will strongly influence the process of crystallization in a way that is unique for each polypeptide due to the diversity of the chemical groups and possible configurations thereof involved in the formation of intermolecular contacts (Giege R. et al., Acta Crystallographica Section D-Biological Crystallography 1994. 50:339; Durbin S D. and Feher G., 1996. Annu Rev Phys Chem. 47:171; Weber P C., Overview of protein crystallization methods, in Macromolecular Crystallography, Pt a. 1997. p. 13-22; Chernov A A., Physics Reports-Review Section of Physics Letters 1997. 288:61; Rosenberger F., Theoretical and Technological Aspects of Crystal Growth 1998. p. 241; Wiencek J M., 1999. Annu Rev Biomed Eng. 1:505). Thus, a widely employed method for empirically determining conditions required for polypeptide crystal growth involves performing automated high-throughput crystallization assays (Morris, D W. et al., 1989. Biotechniques 7:522; Zuk W M. and Ward K B., 1991. Journal of Crystal Growth 110:148; Heinemann U. et al., 2000. Progress in Biophysics & Molecular Biology 73:347). Such high throughput methods employ the sparse-matrix protein crystallization method, in which a series of crystallization conditions are tested in parallel, the most promising ones being iteratively refined until crystallization is achieved (Jancarik J. and Kim S H., 1991. Journal of Applied Crystallography 24:409; Cudney B., et al., 1994. Acta Crystallographica Section D-Biological Crystallography 50:414; Hennessy D. et al., 2000. Acta Crystallographica Section D-Biological Crystallography 56:817). Thus, due to its empirical nature, this approach is inherently inefficient, time-consuming, and requires large amounts of pure polypeptides, which are expensive, and may be difficult or impossible to obtain.
The capacity to routinely produce polypeptides, such as membrane proteins, in a soluble form and quantity enabling their crystallization is highly desirable since membrane proteins nearly 30 percent of the proteins encoded by the eukaryotic genome, function as signal-transducing biological receptors, ion/metabolite channels/transporters, adhesion molecules, and the like, and as a consequence play a pivotal role in the maintenance of health, and in the pathogenesis of a vast range of diseases. For example, major diseases whose pathogenesis is associated with membrane protein functionality include viral diseases, cancer, cardiovascular diseases, neurodegenerative diseases, diabetes, cystic fibrosis, and multi-drug resistance. Accordingly, membrane proteins represent about 70 percent of all drug targets. Thus, high-grade membrane protein crystals could be used to generate vital 3D crystallography data with which to perform computationally assisted design/identification of optimally effective and specific pharmacological agents for treating such diseases. However, membrane protein crystallization is particularly difficult due to the fact that, unlike soluble polypeptides which tend to have hydrophilic surfaces and polar cores, thereby facilitating their expression in bacteria in a soluble form and quantity enabling their crystallization, membrane proteins include large hydrophobic surfaces with which they interact with membrane lipids, as well as hydrophilic portions. As a result, membrane proteins are not readily soluble in either polar or non-polar solvents, and are difficult to express in soluble form by transformed host bacteria, a process generally necessary to produce sufficient protein for crystallization, due to the tendency of such hydrophobic polypeptides to accumulate and overload at the cell membrane, which is also hydrophobic. Membrane proteins are inherently furthermore present at low abundance in the cell.
The capacity to produce proteins, such as membrane proteins, at high levels is highly desirable for numerous applications, including for production of drugs, diagnostic agents, immunogens and crystallization. An optimal means to obtain polypeptides is via recombinant expression in E. coli, due to high expression levels, the variety of plasmids and strains available for expression, the short time needed for cloning, and growth achievable in large quantities and at low cost. However, expression of membrane proteins in bacteria is difficult to achieve for the following reasons.
1. In order for the membrane protein to reach the membrane it must have specific signal sequences to be recognized by the bacterial translocon system. However, processing of overexpressed recombinant proteins overloads the translocon system at the expense of processing of vital endogenous proteins, resulting in host cell death. In most cases, alternate systems target the recombinantly expressed membrane protein to the bacterial membrane, leading to overloading of the bacterial membrane with recombinant membrane protein, and concomitantly resulting in host cell death as well.
2. Elements in the 3′ or 5′ region of the eukaryotic gene can destabilize mRNA leading to low expression levels.
3. Codon usage of prokaryotes is different from that of eukaryotes thus preventing adequate translation or even stopping it completely.
4. Various membrane proteins require interactions with chaperones or other proteins which are not available in the bacteria, leading to misfolded/degraded heterologous protein.
5. Bacteria are rich in proteases which cleave foreign proteins.
6. Bacteria cannot perform posttranslational modifications such as glycosylation and phosphorylation, having a vital role in the activity folding, stability and proper membranal anchoring of the protein (Grisshammer, R. and Tate, C. G., 1995. Quar. Rev. Biophys. 28: 315).
7. The lipid composition of prokaryotic membranes is significantly different from that of eukaryotic membranes and may be an inadequate environment for uptake of heterologous membrane proteins.
8. Bacteria tend to incorporate overexpressed proteins in insoluble inclusion bodies (Grisshammer, R. and Tate, C. G., 1995. Quar. Rev. Biophys. 28: 315).
Some of the problems related to the differences between eukaryotic and prokaryotic translation systems are partially answered by the new strain of E. coli (C43). This strain has several mutations in different proteases and a stable membrane. It can grow and be induced to express heterologous proteins at 18 degrees centigrade, thereby enhancing protein translation and stability upon exit from the ribosome (Miroux and Walker, 1996. J. Mol. Biol. 260: 289-298). Problems related to quality of expressed proteins and expression in inclusion bodies (or insoluble aggregates) have not yet been resolved. There are several examples of expression of eukaryotic proteins in active form in the E. coli system, namely mouse multi-drug resistance-1 protein; Bibi et al., 1993. Proc. Natl. Acad. Sci. 90: 9209-9213), erythrocyte glucose transporter (Sarkar, H. K. et al., 1988. Proc. Nat. Acad. Sci. 85: 5463-5467), glutamate mitochondrial transporter from human (Firemonte et al. 2002) and Arabidopsis ethylene response receptor (Voet-van-Vormizeele, J. and Groth, G., 2003. Protein. Expr. Purif. 32:89-94).
One potentially optimal strategy which has been proposed for obtaining heterologous polypeptides, such as membrane proteins, which are normally insoluble and/or suboptimally expressed when expressed in a cell, in a soluble and purifiable, and hence crystallizable, form involves complexing or fusing such polypeptides with carrier molecules so as to generate complexes/conjugates having such desired properties.
Various prior art approaches have been attempted for obtaining heterologous polypeptides which are normally insoluble and/or suboptimally expressed when expressed in a cell, in a soluble and purifiable, and hence crystallizable, form by combining these with carrier molecules so as to generate complexes/conjugates having the desired characteristics.
One approach involves the use of detergents which interact with the hydrophobic surfaces of the membrane protein in an attempt to generate soluble/crystallizable mixed detergent:protein micelles, and crystallizing such micelles as a two-dimensional (2D) lattice by reconstitution in an artificial lipid bilayer, allowing 2D structural determination via electron microscopy. While such 2D crystals have been obtained, the use of electron microscopy for determining molecular structure has the significant drawback of generating structural information with poor resolution in directions orthogonal to the 2D lattice, thus preventing structural determination at high resolution (Stowell M H. et al., 1998. Curr Opin Struct Biol. 8:595). An additional factor contributing to the difficulty of determining the structure of detergent-associated membrane proteins at high resolution is due to the fact that crystal contacts made between detergent micelles tend to be disordered, resulting in poorly diffracting crystals. Although the use of helical crystals and advanced image processing can obviate some of these drawbacks, it is only with X-ray crystallography of 3D crystals that high resolution determination of 3D protein structure can be achieved. This is essential, for example, to generate detailed pictures of molecular target sites when designing drugs specifically interacting with such sites.
Various prior art approaches involve joining an insoluble heterologous polypeptide to a lipid carrier molecule in an attempt to generate a crystallizable composition.
One carrier lipid-based approach involves binding of an insoluble heterologous polypeptides to divalent metal ion-chelated lipids or electrostatically charged lipids via specific surface histidine residues or via complementarily charged residues, respectively. While planar layers of such lipids have been employed to generate 2D protein crystals (Frey W. et al., 1996. Proc. Natl. Acad. Sci. U. S. A. 93:4937), such crystals can only be analyzed by electron microscopy, as opposed to X-ray diffraction, and consequently can only be used to generate crystallographic structure data of limited resolution and dimensionality.
Another carrier lipid-based approach involves using lipid nanotubes to generate helical crystals of membrane proteins (Wilson-Kubalek, E. et al., Proc. Natl. Acad. Sci. U. S. A. 1998, 95:8040). These crystals, however, can only be used to determine 3D protein structure at low resolution using electron microscopy and thus cannot be employed to solve molecular structure at atomic resolution, as is the case with X-ray crystallography.
A further approach involves complexing membrane proteins with antibody fragments in an attempt to generate complexes having enhanced solubility, and hence crystallizability, and improved capacity to form crystal contacts relative to the non-complexed membrane proteins (Hunte, C. and Michel, H., 2002. Curr Opin Struct Biol. 12: 503-508; Hunte C., 2001. FEBS Lett. 504:126-32; Lange C. and Hunte C., 2002. Proc Natl Acad Sci U S A. 99:2800-5; Ostermeier C. and Michel H., 1997. Curr Opin Struct Biol. 7:697; Ostermeier C. et al., 1997. Proc Natl Acad Sci U S A. 94:10547-53). This approach, however, is expensive and impractically complex, time-consuming and inefficient since it must be specifically tailored for each individual membrane protein, in particular due to the need to employ antibodies having different specificities for each individual membrane protein.
Yet a further approach involves expressing a fusion protein which comprises the E. coli-derived carrier protein NusA (495 amino acid residue length), GrpE, or bacterioferritin fused to an in soluble heterologous polypeptide which is normally produced in the form of inclusion bodies (Davis, G. D. et al., 1999. Biotechnol. Bioeng. 65: 382-388). Such an approach, however, employs excessively large carrier proteins, and fails to demonstrate optimally broad applicability with respect to diverse heterologous polypeptides.
An additional approach involves expressing a fusion protein which comprises the E. coli-derived carrier protein maltose binding protein (MBP, 370 amino acid residue length), glutathione S-transferase (GST), or thioredoxin fused to a heterologous polypeptide which is normally insoluble and/or suboptimally expressed when expressed in a cell (Kapust, R. B., Waugh, D. S., 1999. Protein Sci. 8:1668-1674). Such an approach, however, has the critical disadvantage of employing carrier proteins which are excessively large and/or suboptimally effective for generating fusion proteins which are soluble.
Still a further approach involves expressing a fusion protein which comprises a heterologous polypeptide translationally fused to an E. coli carrier protein conferring upon the fusion protein enhanced expressibility in soluble/crystallizable form by bacterial host cells relative to the native heterologous polypeptide (U.S. Pat. Nos. 6,207,420 and 5,989,868). Such an approach is associated with various critical disadvantages, however. Namely, such an approach is furthermore only applicable to facilitating solubilization/production of very small polypeptides, since the largest polypeptide of interest demonstrably expressed fused to a carrier polypeptide by this approach has a molecular weight of only 21.6 kilodaltons. Additionally, such an approach has the critical drawback of employing a carrier polypeptide having a molecular weight which is at least as high as that of the heterologous polypeptide.
Yet still a further approach involves expressing a fusion protein which comprises the heterologous polypeptide bovine cytochrome b5 (134 amino acid length; 16.5 kilodaltons) fused to the carrier polypeptide E. coli thioredoxin (109 amino acid residue length, 12 kilodaltons; Begum, R. R. et al., 2000. J. Chromatogr. B Biomed. Sci. Appl. 737:119-30). Such an approach, however has the critical disadvantages of employing a carrier polypeptide which is at least approximately three-quarters the size of the heterologous polypeptide, and is only applicable to facilitating solubilization/production of very small polypeptides, since the largest polypeptide of interest demonstrably expressed fused to a carrier polypeptide by this approach has a molecular weight of only 16.5 kilodaltons. Furthermore, this approach has failed to demonstrate general applicability with respect to diverse heterologous polypeptides.
Prior art soluble fusion proteins which are formed using carrier polypeptides which have a molecular weight which is at least approximately three-quarters that of the heterologous polypeptide to which they are fused will tend to distort the native conformation of the heterologous polypeptide to an excessively large extent via correspondingly large steric and electrostatic effects. This is highly undesirable since this will prevent generation of fusion protein crystals capable of generating crystallographic data defining the native 3D atomic structure of membrane proteins with optimal accuracy. Furthermore, the excessively large size of the carrier polypeptide inherently results in inefficient production yields of the heterologous polypeptide. The excessive conformational distortion of the heterologous polypeptide is furthermore highly undesirable for its use, in the form of the fusion protein, as a therapeutic/diagnostic reagent, or as an immunogen for raising antibodies specific for native conformational epitopes thereof. Critically, such an approach additionally fails to demonstrate general applicability with respect to a significantly diverse range of heterologous polypeptides.
Thus, the prior art fails to provide a generally applicable method of producing, in a satisfactorily/optimally soluble, purifiable, and crystallizable form, heterologous polypeptides, such as membrane proteins, which are normally insoluble and/or suboptimally expressed when expressed in a cell.
There is thus a widely recognized need for, and it would be highly advantageous to have, a method devoid of the above limitation.