The present invention relates generally to improving the solubility of proteins/peptides and, more particularly to a method for identifying more or less soluble proteins/peptides from libraries of mutants thereof generated from the directed evolution of genes which express these proteins/peptides. This invention was made with government support under Contract No. W-7405-ENG-36 awarded by the U.S. Department of Energy to The Regents of the University of California. The government has certain rights in the invention.
Protein insolubility constitutes a significant problem in basic and applied bioscience, in many situations limiting the rate of progress in these areas. Protein folding and solubility has been the subject of considerable theoretical and empirical research. However, there still exists no general method for improving intrinsic protein solubility. Such a method would greatly facilitate protein structure-function studies, drug design, de novo peptide and protein design and associated structure-function studies, industrial process optimization using bioreactors and microorganisms, and many disciplines in which a process or application depends on the ability to tailor or improve the solubility of proteins, screen or modify the solubility of large numbers of unique proteins about which little or no structure-function information is available, or adapt the solubility of proteins to new environments when the structure and function of the protein(s) are poorly understood or unknown.
Overexpression of cloned genes using an expression host, for example E. coli, is the principal method of obtaining proteins for most applications. Unfortunately, many such cloned foreign proteins are insoluble or unstable when overexpressed. There are two sets of approaches currently in use which deal with such insoluble proteins. One set of approaches modifies the environment of the protein in vivo and/or in vitro. For example, proteins may be expressed as fusions with more soluble proteins, or directed to specific cellular locations. Chaperons may be coexpressed to assist folding pathways. Insoluble proteins may be purified from inclusion bodies using denaturants and the protein subsequently refolded in the absence of the denaturant. Modified growth media and/or growth conditions can sometimes improve the folding and solubility of a foreign protein. However, these methods are frequently cumbersome, unreliable, ineffective, or lack generality. A second set of approaches changes the sequence of the expressed protein. Rational approaches employ site-directed mutation of key residues to improve protein stability and solubility. Alternatively, a smaller, more soluble fragment of the protein may be expressed. These approaches require a priori knowledge about the structure of the protein, knowledge which is generally unavailable when the protein is insoluble. Furthermore, rational design approaches are best applied when the problem involves only a small number of amino-acid changes. Finally, even when the structure is known, the changes required to improve solubility may be unclear. Thus, many thousands of possible combinations of mutations may have to be investigated leading to what is essentially an xe2x80x9cirrationalxe2x80x9d or random mutagenesis approach. Such an approach requires a method for rapidly determining the solubility of each version.
Random or xe2x80x9cirrationalxe2x80x9d mutagenesis redesign of protein solubility carries the possibility that the native function of the protein may be destroyed or modified by the inadvertent mutation of residues which are important for function, but not necessarily related to solubility. However, protein solubility is strongly influenced by interaction with the environment through surface amino acid residues, while catalytic activities and/or small substrate recognition often involve partially buried or cleft residues distant from the surface residues. Thus, in many situations, rational mutation of proteins has demonstrated that the solubility of a protein can be modified without destroying the native function of the protein. Modification of the function of a protein without effecting its solubility has also been frequently observed. Furthermore, spontaneous mutants of proteins bearing only 1 or 2 point mutations have been serendipitously isolated which have converted a previously insoluble protein into a soluble one. This suggests that the solubility of a protein can be optimized with a low level of mutation and that protein function can be maintained independently of enhancements or modifications to solubility. Furthermore, a screen for function may be applied concomitantly after each round of solubility selection during the directed evolution process.
In the absence of a screen for function, for example when the function is unknown, the final version of the protein can be backcrossed against the wild type in vitro to remove nonessential mutations. This approach has been successfully applied by Stemmer in xe2x80x9cRapid Evolution Of A Protein In Vitro By DNA Shuffling,xe2x80x9d by W. P. C. Stemmer, Nature 370, 389 (1994), and in xe2x80x9cDNA Shuffling By Random Fragmentation And Reassembly: In Vitro Recombination For Molecular Evolution,xe2x80x9d by W. P. C. Stemmer, Proc. Natl. Acad. Sci. USA 91, 10747 (1994) to problems in which the function of a protein had been optimized and it was desired to remove nonessential mutations accumulated during directed evolution. The development of highly specialized protein variants by directed, in vitro evolution, which exerts unidirectional selection pressure on organisms, is further discussed in: xe2x80x9cSearching Sequence Space: Using Recombination To Search More Efficiently And Thoroughly Instead Of Making Bigger Combinatorial Libraries,xe2x80x9d by Willem P. C. Stemmer, Biotechnology 13, 549 (1995); in xe2x80x9cDirected Evolution: Creating Biocatalysts For The Future,xe2x80x9d by Frances H. Arnold, Chemical Engineering Science 51, 5091 (1996); in xe2x80x9cDirected Evolution Of A Fucosidase From A Galactosidase By DNA Shuffling And Screening,xe2x80x9d by Ji-Hu Zhang et al., Proc. Natl. Acad. Sci. USA 94, 4504 (1997); in xe2x80x9cFunctional And Nonfunctional Mutations Distinguished By Random Combination Of Homologous Genes,xe2x80x9d by Huimin Zhao and Frances H. Arnold, Proc. Natl. Acad. Sci. USA 94, 7007 (1997); and in xe2x80x9cStrategies For The In Vitro Evolution of Protein Function: Enzyme Evolution By Random Recombination of Improved Sequencesxe2x80x9d, by Jeff Moore et al., J. Mol. Biol. 272, 336-346 (1997). Therein, efficient strategies for engineering new proteins by multiple generations of random mutagenesis and recombination coupled with screening for improved variants is described. However, there are no teachings concerning the use of directed evolutionary processes to improve solubility of proteins; rather, the mutagenesis was directed to improvement of protein function. It should be mentioned, however, that in order for the protein to function properly in any environment, it must at least be correctly folded.
Finally, for structural determination it is often not necessary or even desirable to have a fully functional version of the protein. If the mutational rate is low (ensured by molecular backcrossing), it is likely that the structure of the wild-type and solubility optimized versions of a protein will be similar. As long as the protein is soluble, and a structure can be obtained, it should then be possible to redesign the solubility of the protein using rational methods, if desired.
Green fluorescent protein has become a widely used reporter of gene expression and regulation. DNA shuffling has been used to obtain a mutant having a whole cell fluorescence 45-times greater than the standard, commercially available plasmid GFP.
See, e.g., xe2x80x9cImproved Green Fluorescent Protein By Molecular Evolution Using DNA Shuffling,xe2x80x9d by Andreas Crameri et al., Nature Biotechnology 14, 315 (1996). The screening process optimizes the function of GFP (green fluorescence), and thus uses a functional screen. Although the screening process coincidentally optimizes the solubility of the GFP, in that the GFP is only fluorescent when properly folded, there is no mention of using soluble GFP as a tag to monitor solubility of other proteins; that is, the function of the protein and not its solubility are being modified. In xe2x80x9cWavelength Mutations And Post-translational Auto-oxidation Of Green Fluorescent Protein,xe2x80x9d by Roger Heim et al., Proc. Natl. Acad. Sci. USA 91, 12501 (1994), GFP was mutagenized and screened for variants with altered absorption or emission spectra. The authors mention that in place of proteins labeled with fluorescent tags to detect location and sometimes their conformational changes both in vitro and in intact cells, a possible strategy would be to concatenate the gene for the nonfluorescent protein of interest with the gene for a naturally fluorescent protein and express the fusion product. However, the focus of this paper is the extension of the usefulness of GFP by enabling visualization of differential gene expression and protein localization and measurement of protein association by fluorescence resonance energy transfer, by making available two visibly distinct colors. There is no mention of the use of the gene construct for solubility determinations. The paper further discusses the expression of GFP in E. coli under the control of a T7 promoter, and that the bacteria contained inclusion bodies consisting of protein indistinguishable from jellyfish or soluble recombinant protein on denaturing gels, but that this material was completely nonfluorescent, lacked the visible absorbance bands of the chromophore, and did not become fluorescent when solubilized and subjected to protocols that renature GFP, as opposed to the soluble GFP in the bacteria which undergoes correct folding and, therefore, fluoresces.
Chun Wu et al. in xe2x80x9cNovel Green Fluorescent Protein (GFP) Baculovirus Expression Vectors,xe2x80x9d Gene 190, 157 (1997), describe the construction of Baculovirus expression vectors which contain GFP as a reporter gene. The authors follow the production and purification of a protein of interest by in-frame cloning of the gene that expresses the protein in insect cells with the GFP open reading frame, thereby permitting visualization of the produced GFP-fusion protein using UV light. However, the purified GFP-XylE fusion protein was found to be insoluble after harvest. The authors did not correlate the level of fluorescence of the cells expressing the GFP-XylE fusion protein with the solubility of cells expressing the XylE protein alone. Therefore, this reference does not teach the use of the fusion protein fluorescence as an indicator of the solubility of the specific protein XylE or of the solubility of other proteins.
In xe2x80x9cApplication Of A Chimeric Green Protein Fluorescent Protein To Study Protein-Protein Interactions,xe2x80x9d by N. Garamszegi et al., Biotechniques 23, 864 (1997), the authors discuss the fusion between GFP and human calmodulin-like protein (CLP) and show that this protein retains fluorescence and the known characteristics of CLP. That is, the GFP portion remains responsible for efficient fluorescent signals with little or no influence on the properties of the fused protein of interest. The authors maintain that the exhibited GFP fluorescence provides information concerning the maintenance of the GFP structural integrity in the chimeric protein, but does not provide information about the integrity of the entire fusion protein and, in particular, does not allow any statements concerning the maintenance of CLP function or integrity. From these statements, it is clear that this paper does not contemplate the use of the GFP as a solubility reporter for the CLP.
It has been demonstrated that improving the apparent functionality of a protein can sometimes increase the concomitant solubility of the protein, as in: xe2x80x9cRedesigning enzyme topology by directed evolution,xe2x80x9d by G. Macbeath, P. Kast, and D Hilvert, Science 279, 1958-1961 (1998); xe2x80x9cExpression of an antibody fragment at high levels in the bacterial cytoplasm,xe2x80x9d by P. Martineau, P. Jones, and G. Winter, J. Mol. Biol. 280, 117-127 (1998); xe2x80x9cAntibody scFv fragments without disulfide bonds made by molecular evolution,xe2x80x9d K. Proba, A. Worn, A. Honegger, and A. Pluckthun, J. Mol. Biol. 275, 245-253 (1998); and xe2x80x9cFunctional Expression of Horseradish Peroxidase in E. coli by Directed Evolution,xe2x80x9d Lin Zhanglin, Todd Thorsen, and Frances H. Arnold, Biotechnol. Prog. 15, 467-471 (1999). In each case, the driving force for the directed evolution was the functionality of the protein of interest. For example, if the protein was an enzyme, the assay for improved function was the turnover of a chromogenic analog of the enzyme""s natural substrate; if the protein was an antibody, it was the recognition of the target antigen by the antibody. For cytoplasmic expression of antibodies, the recognition was linked to cell survival, (binding of the antibody to a selectable protein marker which was an antigen for the antibody of interest providing selection for functional antibodies); in the case of phage displayed antibodies without disulfide bonds, the recognition was transduced to successful binding of the displayed phage to the target antigen of the displayed antibody in a biopanning protocol. The authors expressed the proteins in E. coli, and noted an apparent increase in the amount of protein expressed in the soluble fraction relative to the unselected target proteins, noting that the apparent increase in activity of desirable mutants during the evolution was due at least in part to an increase in the number of correctly folded (and hence functional) protein molecules, and not exclusively to an increase in the specific activity of a given protein molecule. However, the driving force for the selection or screening process during the directed evolution depended on the functionality (and functional assay for) the protein of interest. Many proteins have no easily detectable functional assay, and thus identification of proteins with improved folding yield by an increase in apparent activity due to a larger number of correctly folded molecules, is not a general method for improving folding by directed evolution. Furthermore, even when functional assays are available, apparent increases in activity can also be due to increases in the specific activity (activity of an individual protein molecule) even when the total number of correctly folded molecules remains the same. Thus, increases in apparent activity do not necessarily translate to increases in the solubility of proteins. Furthermore, functional assays are protein-specific, and thus must be developed on a case-by-case basis for each new protein. Functional assays therefore lack the generality needed to identify proteins which are soluble, or to find genetic variants (mutants and fragments) of proteins with improved solubility, in a high-throughput manner for proteomics or functional genomics wherein large numbers of different proteins about which little or no functional/structural information is known, are to be solubly expressed.
Information relevant to the present invention is disclosed in xe2x80x9cRapid Protein-Folding Assay Using Green Fluorescent Proteinxe2x80x9d by Geoffrey S. Waldo et al., Nature Biotechnology 17, 691-695 (1999), the teachings of which publication are hereby incorporated by reference herein.
Accordingly, it is an object of the present invention to provide a solubility reporter for rapidly identifying soluble forms of proteins.
Another object of the invention is to provide a method for modifying the solubility of proteins by generating large numbers of genetic mutants of the gene which encodes for the protein to be solubilized which can be expressed and the resulting proteins screened for solubility.
Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
To achieve the foregoing and other objects, and in accordance with the purposes of the present invention, as embodied and broadly described herein, the method for determining the solubility of a protein, P, of this invention may include the steps of: fusing a DNA fragment, [P], which codes for protein P with the DNA fragment, [R], which codes for a reporter protein, R, forming thereby a fusion DNA fragment, [P-R], which codes for the protein, P-R; ligating the [P-R] fragment into an expression vector to form a plasmid DNA; introducing the plasmid DNA into an expression host such that the fusion protein is overexpressed therein; and detecting protein R in fusion protein P-R, whereby the detection of protein R in fusion protein P-R is an indication that protein P is soluble.
Preferably, the DNA fragment [P] is fused with the DNA fragment [L] which codes for a flexible linker peptide, L, which has been fused with the DNA fragment [R], forming thereby either fusion DNA fragment [P-L-R] or fusion DNA fragment [R-L-P], such that the detection of R in the fusion proteins encoded by [P-L-R] or [R-L-P] is an indication that protein P is soluble.
Preferably also, the DNA fragment bearing [L-R] or [R-L] is part of an expression vector and/or transfection/transformation vector enabling the fusion of [P] to yield the DNA fusions [P-L-R] or [R-L-P] as part of the vectors, thereby enabling a host cell to express either the fusion protein P-L-R or the fusion protein R-L-P, such that the detection of R in the fusion protein P-R is an indication that protein P is soluble.
It is also preferred that the linker peptide is short, flexible, hydrophilic and soluble.
Preferably also, the reporter protein includes green fluorescent protein.
In another aspect of the present invention, in accordance with its objects and purposes, the method for modifying the solubility of a protein, P, hereof may include the steps of: introducing mutations into the DNA fragment [P] which codes for protein P, thereby generating a combinatorial library of mutated variants, [X]; in-frame fusing individual [X] variants with a DNA construct such as a plasmid vector which includes a DNA fragment which codes for a reporter protein, [R], forming thereby a set of DNA constructs containing [X-R] which code for the fusion proteins X-R such that the detection of R in any of the X-R fusion proteins is an indication that the variant protein X contained therein is soluble; introducing each of the DNA constructs into an expression host such that each host cell expresses a unique variant X as a fusion protein X-R therein; and detecting R in X-R, whereby an increase in the detection of R in a host expressing a variant X-R fusion protein relative to that of a host expressing the P-R fusion protein, is an indication that the solubility of variant protein X has increased relative to the solubility of protein P.
Preferably, the DNA fragment [X] is fused with the DNA fragment which codes for a flexible linker peptide, [L], which has been fused with the DNA fragment [R], thereby forming either fusion DNA fragment [X-L-R] or fusion DNA fragment [R-L-X], such that an increase in the detection of R in the fusion proteins expressed by the [X-L-R] or the [R-L-X] is an indication that the solubility of variant protein X has increased relative to the solubility of protein P.
Preferably also, the DNA fragment bearing [L-R] or [R-L] is part of an expression vector and/or transfection/transformation vector enabling the fusion of [X] to yield the DNA fusions [X-L-R] or [R-L-X] as part of said vectors, thus enabling a host cell to express either the fusion protein X-L-R or the fusion protein R-L-X, such that an in crease in the detection of R in the fusion protein is an indication that the solubility of protein X has increased relative to the solubility of protein P.
It is preferred that the linker peptide short, flexible, hydrophilic and soluble.
Preferably also the reporter protein includes green fluorescent protein.
It is also preferred that the step of introducing mutations into [P] generating thereby a combinatorial library of mutated variants [X] is achieved using gene shuffling and directed evolution.
Benefits and advantages of the present invention include the enhancement of the solubility of proteins of interest without having to individually test, (such as by large-scale growth of each mutant in question followed by cell lysis, fractionation and sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE)), the solubility of each protein modification generated, and has general applicability.