The invention described herein relates to a method for selecting proteins which have high stability. Specifically, it relates to a system with which it is possible to screen out and to isolate the most stable proteins from a large number of mutants thereof.
Proteins are widely used as enzymes or biocatalysts in industrial biotechnological processes. As antibodies, receptors, vaccines or hormones, they have a great potential for use in medical diagnosis and therapy. Unfortunately, however, there are limitations on use both in industry and in medicine. These derive in particular from the stability of the proteins being too low (Martinek, K. and Mozhanev, V. V., 1993). Biotechnological processes often proceed under reaction conditions which are survived for only a short time by the enzymes employed. Organic solvents, extreme pH values or high temperatures, which are advantageous or even necessary for many reactions, may lead to rapid inactivation of the enzymes (Gupta, S. and Gupta, M. N., 1993). Increased stability proves to be advantageous in the preparation of proteins, too. Thus, in general, the yields and the purity of a prepared protein can be improved if it has increased stability. An increased stability also extends the shelf life of proteins and simplifies storage and transport (Brems, D. N. et al., 1992). It is therefore of great interest to increase the stability of proteins. In the first place, a distinction must be made between different types of stability. Thermal stability (toward denaturation and aggregation), conformational stability (toward organic solvents) and chemical stability (toward oxidation, modification of the side groups) are particularly important on use as biocatalysts. Resistance to proteases is important in the medical sector. The thermodynamic stability is a measure of the equilibrium between folded, native (active) protein and its unfolded, denatured (inactive) form. An increase in the thermodynamic stability thus means a shift in the equilibrium toward native protein. The various types of stability correlate reasonably well. This means that a protein thermodynamically stabilized in its conformation has generally also undergone a stabilization according to the other criteria. This derives from the fact that irreversible inactivation (e.g. aggregation, degradation) mostly originates from unfolded protein, or inactivating alterations (e.g. chemical modification) can be averted better (Imoto, T., 1997).
There are in principle two possibilities for increasing the stability of a protein. On the one hand, proteins can be stabilized by external factors (e.g. solvent, immobilization, chemical modification) (Gray, C. J., 1993, Tyagi, R. and Gupta, M. N., 1993, Cabral, J. M. S. and Kennedy, J. F., 1993). On the other hand, the intrinsic stability of a protein can be increased by altering its amino acid sequence by mutations. This latter method is also referred to as protein engineering. Whereas the first method, external stabilization, rapidly reaches its limits through the conditions of use of the protein, the advantage of protein engineering is that it is possible thereby to stabilize proteins for applications under various conditions. It should be noted in this connection that the optimization of proteins in their physiological environment is not for thermodynamic stabilization but for folding rate, flexibility and degradability in the cell also (Shoichet, B. K. et al., 1995). It must therefore be assumed that there is sufficient potential available for stabilization by mutations.
In protein engineering in turn there are two ways of proceeding. On the one hand, stabilizing mutations can be deliberately introduced and, on the other hand, the stabilizing mutations can be selected out of a large number of randomly generated ones. Targeted mutagenesis demands extensive knowledge about the stabilizing interactions in a protein in order to have some probability of success. It is true that the principal types of interactions are known (e.g. Pace, C. N. et al., 1996), and computer-assisted algorithms moreover sometimes come quite close to predicting protein structures from the amino acid sequence (Fischer, D. and Eisenberg, D., 1996, Bowie, J. U. and Eisenberg, D., 1993). However, results of predicting the effect of individual mutations remain unsatisfactory. Even if the spatial structure of the protein to be modified is known, it is scarcely possible to predict or calculate the effects of mutations on protein stability because, in the end, knowledge about the denatured state of the protein or alternative conformations is still lacking. However, it is possible to apply targeted mutagenesis if homologs of the protein used are already known from thermophilic organisms. It is then possible, by sequence comparisons, to identify positions at which the protein may possibly be stabilized by directed mutagenesis.
Generation of a large number of randomly selected mutations with subsequent selection for the desired property is referred to as directed evolution. The advantage of directed evolution is that knowledge about the structure of a protein or about the interactions important for folding and stability is not a precondition. In line with Darwin""s xe2x80x9csurvival of the fittestxe2x80x9d there is cumulation of the mutants which come closest to the property which is being selected for. Two important preconditions must be met for application of directed evolution. In the first place, the property which is being selected for must in fact be selectable and, in the second place, it is necessary for the selected protein variants to be coupled with the nucleotide sequences coding for them. If it is possible to select for an activity necessary for growth of a microorganism (e.g antibiotic resistance), both conditions are met. Only those microorganisms which have developed this activity multiply. In order to achieve stabilization of proteins from mesophilic organisms, it is possible to incorporate them into a thermophilic organism and then to allow the latter to grow at appropriately elevated temperature. However, this method is applicable for only a few proteins because a precondition thereof is that the protein has an activity which is distinctly advantageous or even necessary for growth of the thermophilic organism. If the proteins to be stabilized do not have such an activity (which applies to most biocatalysts and, in particular, to proteins in medical therapy and diagnosis), it is necessary to carry out the selection indirectly or in vitro. Since in this case the proteins are separated from the organisms producing them there is initially the important problem of linkage to the nucleotide sequences encoding them. Various systems have been developed in the form of phage display (Smith, G. P., 1991, Patent WO 92/01047), cell surface display (Georgiou, G. et al., 1997), ribosome display (Mattheakis, L. C. et al., 1996, Hanes, J. and Plxc3xcckthun, A., 1997), repressor display (Cull, M. G. et al., 1992) and selectively infectious phage display (SIP) (Krebber, C. et al., 1995, patent application EP 94102334) to make this coupling possible. However, application of these systems is essentially confined to the selection of binding properties, in particular that of single chain antibody fragments.
It is an object of the present invention to make the potential of directed evolution available for stabilizing proteins. Thus, the aim is to develop a method which makes it possible to select out of a large number of randomly generated mutants of any protein those having increased stability.
We have found that this object is achieved by the method presented here. It makes it possible for directed evolution to be used as method for stabilizing proteins. The selection criteria used for the thermodynamic stability is the resistance of a protein to proteolytic degradation, i.e. the stability to proteases. As described above, the various types of stability are closely interconnected. In particular, the correlation between protease resistance and thermodynamic stability has also been explicitly shown (Parsell, D. A. and Sauer, R. T., 1989). Coupling of the protein to its coding sequence is ensured by the fact that it is presented on the surface of an infectious replicable gene package (e.g. of a filamentous bacteriophage) (Krebber, C. et al., 1995, patent application EP 94102334).
The invention described herein makes it possible to select those phages (from a repertoire of specifically modified phages which present on their surface a particular protein) which present the most stable variants of this protein.
More generally stated, the invention described herein relates to a novel method for selecting a gene. In this connection, its ability for replication is coupled to the stability of the protein (PT) which is encoded by said gene. Gene and protein form part of an infectious replicable gene package (IRG) which can be replicated after it has infected a suitable host organism. An IRG can be, for example, a filamentous bacteriophage which infects a bacterial cell. The term stability in xe2x80x9cstability of the proteinxe2x80x9d relates primarily to the thermodynamic stability of the protein. Further included therein are other forms of stability, such as resistance to thermal or solvent-dependent inactivation of any enzymic activity of the protein, resistance to aggregation, resistance to proteolytic cleavage by proteases and stability of a complex with a ligand.
The invention comprises the following:
1.) An IRG is modified in such a way that a protein (PT) is incorporated between domains of a protein necessary for its infectivity, specifically in such a way that when said domains are separated from one another, e.g. by proteolytic cleavage of the incorporated protein (PT), the infectivity of the IRG is lost (FIG. 1). If the IRG is a filamentous bacteriophage, this can be achieved by incorporating a PT between the C-terminal domain and the N-terminal domains in all copies of the gene III protein necessary for the infection.
2.) Standard mutagenesis methods are used to generate variants of the genetic material of the IRGs which differ specifically by alteration in the sequence (both nucleotide sequence and amino acid sequence resulting therefrom) of PT. Generation of a large repertoire of variants (gene library) is typical in this connection.
3.) The genetic material is expressed in recombinant host organisms to produce IRGs which harbor PT as fusion proteins within the proteins necessary for the infection.
4.) The IRGs are incubated under particular solvent conditions with which some of the PTs are partly in denatured form. Said incubation takes place in the presence of protease in the solvent or, alternatively, said incubation without protease is immediately followed by a further incubation with protease. In the former case, the protease used must be active under the particular conditions. If the incubation is divided into two, the protease does not depend on the conditions in the first incubation. Said solvent conditions relate, for example, to pH, temperature, salt concentration, proportion of organic solvent or concentration of possible ligands of PT. In the PT repertoire there are variants which, under the particular solvent conditions, are more in denatured form than are other variants, which means that they represent better substrates for proteases. Very fine adjustment of the system is possible by slight variation in the solvent conditions or else the concentration or nature of the protease, which means that even small differences in the stability of various PTs lead to distinct differences in the rate of PT cleavage. For example, the incubation can be carried out at 37xc2x0 C., 100 mM potassium phosphate, pH 8.0, 0.4 mM CaCl2, 2.5 xcexcM chymotrypsin for 30 min.
5.) Host organisms are infected with IRGs treated as under 4, and are multiplied. Only IRGs whose PTs have not been cleaved by proteases retain their infectivity and can be replicated. The host organisms produce novel IRGs which are able to go through the cycle anew so that there is enrichment of the IRGs whose PTs show the greatest stabilization (FIG. 2).
6.) The genes which encode PTs can alternatively be isolated by standard methods such as, for example, by PCR of the genome with appropriate primers. These genes can, on the one hand, be incorporated, with or without modification (e.g. mutagenesis, gene shuffling (Stemmer, W. P. C., 1994)), into the genetic material of the IRGs again, and the latter can be subjected to a new selection. On the other hand, they can also be used for PT sequence analysis.
The invention further relates also to their use for assessing the stability of a protein by determining the proteolysis-dependent rate of loss of infectivity of an IRG which has incorporated this protein on its surface as described above.
It is preferred in the invention described herein for said IRGs to be filamentous bacteriophages and for the protein necessary for the infection to be the gene III protein, in which case the proteins (PT) incorporated in the gene III protein are inserted between the C-terminal domain and the two N-terminal domains of the gene III protein. Preferred filamentous bacteriophages are specifically those of class I (fd, M13, fl, Ifl, Ike, ZJ/2 or Ff) or of class II (Xf, Pf1 or Pf3). In addition, preferred IRGs are those having in their genome no sequence sections which favor a recombination by which the incorporated genes coding for PT are eliminated. Further preferred IRGs are those which contain in their genome genes which, when expressed in the host organism, confer a growth advantage on the latter. For example, genes for antibiotic resistances can be present in the genome of the IRGs.
Further preferred PTs are globular proteins. If the stability of the PTs to be investigated is too high, they can be destabilized by targeted mutations (resulting in PTxe2x80x2). Based on PTxe2x80x2, selected stabilizing mutations can be incorporated into PT. The stabilizing contributions are in most cases additive (Skinner, M. M. and Terwilliger, T. C. (1996), Wells, J. A. (1990)). It is preferred for the destabilizing mutations to result in deletion of disulfide bridges.
Preference is further given to proteases which recognize side groups of amino acids and have specific cleavage sites. Preferred in this connection are proteases which specifically cut at aromatic and/or aliphatic amino acid residues.
The generation of variants of the genetic material of the IRGs, specifically of the genes of PTs, preferably takes place by using standard methods, either of random mutagenesis or of site-specific mutagenesis.
The invention further relates to the use for a kit for selecting or for screening for genes which encode the most stable variants of proteins, which kit comprises a specifically constructed vector. This vector can be used to produce IRG and should have one or more suitable cloning sites into which DNA can be inserted. xe2x80x9cSuitable cloning sitexe2x80x9d refers in this connection to a region of the vector in which there is at least one restriction cleavage site which can be used to insert DNA. This region should moreover be located in the coding sequence of a protein of the IRG which is necessary for the infection of a host organism by the IRG. For example, the vector pFD4Anl (see Example) can be used for this purpose. The vector should additionally have the property of being packaged as an IRG, in which case the protein (PT) or the collection of proteins (PT) which are encoded by the DNA inserted into the suitable cloning site are incorporated as fusion protein in a protein necessary for the infection of a host organism.
Further constituents of the invention described herein are defined in the claims.