The present invention relates to a method for obtaining nucleic acid sequences encoding (poly)peptides which increase the expression yields of periplasmic proteins in functional form upon co-expression of said (poly)peptides and said periplasmic proteins. The invention also provides a method for the identification of said (poly)peptides. Furthermore, the present invention relates to a method for increasing the expression yields of periplasmic proteins in functional form by co-expressing (poly)peptides, for example Skp, FkpA, or a homolog of Skp or FkpA, in bacteria.
Expression in the bacterial periplasm is the most convenient route to express foreign recombinant proteins, especially proteins containing disulphides, since the bacterial disulphide forming and isomerization machinery (Bardwell, 1994) can be utilised. Nevertheless, not all proteins can be produced with high functional yield in the E. coli periplasm, and no general method for optimizing the expression in functional form of poorly folding proteins secreted into the periplasm exists.
Another field where the correct folding of proteins in the periplasm is of crucial importance is in phage display. This method has been used over the last decade to screen libraries not only of peptides but also of a large variety of proteins (Dunn, 1996; McGregor, 1996). These displayed proteins are fused to a phage coat protein, e.g. to the N-terminus of the whole gene-3-protein (g3p) or to its C-terminal domain. These proteins therefore fold in the periplasm, while remaining anchored to the inner membrane by the C-terminal hydrophobic extension of g3p, before being incorporated into the phage coat. Therefore, the g3p fusion-proteins will almost certainly fold in the same environment and use the same machinery as periplasmically expressed proteins. Poorly folding proteins will most likely be lost over multiple screening rounds irrespective of their binding properties.
Co-expression of the cytoplasmic chaperonins GroEL and GroES during M13 phage assembly for Fab display were reported to lead to a 200fold increase in phage titer (Sxc3x6derlind, 1993). However, the relative amount of functional antibody fragments being displayed by the phage particles was not affected. It was speculated that GroEL/GroES assist in phage packing and assembly, although these steps take place in the periplasm. A general method for increasing the functional display of proteins on phage is not yet available.
Consequently, there has been great interest in the question of the existence of periplasmic chaperones. However, unlike the well-characterized cytoplasmic machinery of E. coli, DnaK/DnaJ/GrpE and GroEL/GroES and possibly others (Makrides, 1996 , Martin and Hartl, 1997; Buchner, 1996 ; EP 0 774 512 A3), the chaperone composition of the periplasm has remained poorly understood (Wall and Plxc3xcckthun, 1995; Missiakas et al., 1996). While progress in elucidating the signal transduction of periplasmic stress has been made (Missiakas and Raina, 1997), the ultimate effector molecules controlling periplasmic folding have remained obscure, although some proteins, such as FkpA or SurA, were believed to act as general periplasmic folding catalysts (Missiakas et al., 1996). FkpA has first been described as very similar to the eukaryotic FK506 binding proteins (FKBPs) (Horne and Young, 1995), a class of well-characterized peptidyl-prolyl cis-trans isomerases (PPIs), which have been shown to be inhibited by the macrolipide FK506. Missiakas and co-workers showed, that the mature FkpA is located in the periplasm and assayed its activity (Missiakas et al., 1996). The estimated Kcat/Km of the cis-trans isomeration of the Ala-Pro peptidyl-prolyl bond using succinyl-Ala-Ala-Pro-Phe-4-nitroanilide (SEQ ID NO: 1) as substrate was 90mM-1s-1. FkpA is directly regulated by "sgr"E, which binds in its promoter region (Danese and Silhavy, 1997). The "sgr"E pathway is induced by heat stress and conditions, that lead to misfolding or misassembly of outer membrane proteins (OMPs), such as over-expression of OMPs or inactivation of the surA gene.
Another protein which has been discussed in the context of periplasmic folding and protein transport is Skp. Skp is a very basic protein, which at first led to its misassignment as a DNA-binding protein (Holck et al., 1987), later as an outer membrane associated protein (Hirvas et al., 1990; Koski et al., 1990; Koski et al., 1989), and a variety of synonyms (OmpH, HlpA) witness its unclear function. Homologs have been found in Salmonella typhimurium (Koski et al., 1990; Koski et al., 1989), Yersinia enterocolitica (Hirvas et al., 1991), Yersinia pseudotuberculosis (Vuorio et al., 1991), Haemophilus influenzae (Fleischmann et al., 1995) and Pasteurella multocida (Delamarche et al., 1995). Mxc3xcller and co-workers (Thome et al., 1990) showed that this protein stimulates the in vitro import of E. coli proteins into membrane vesicles and subsequently established its periplasmic location (Thome and Mxc3xcller, 1991), consistent with its soluble nature and the presence of a signal sequence. More recently, it was proposed to be involved in the transport of outer membrane proteins (Chen and Henning, 1996), and when its promoter region was interrupted by a Tn10 transposon, the extreme heat shock factor "sgr"E ("sgr"24) dependent response was induced (Missiakas et al., 1996). However, it remained unclear whether this is an effect of the absence of Skp or a polar effect on other proteins located downstream of skp. The heat shock response was probably induced indirectly via a change in the concentration of outer membrane proteins, which is known (Missiakas et al., 1996) to induce a "sgr"E ("sgr"24).
However, attempts to increase the expression of antibody fragments in functional form by over-expressing E. coli disulphide isomerase DsbA and/or proline cis-trans isomerase PPlase A did not significantly change the folding limit (Knappik et al., 1993). It was concluded that aggregation steps in the periplasm compete with periplasmic folding, and that they may occur before disulphide formation and/or proline cis-trans isomerization take place and be independent of their extent.
In summary, no protein has up to now been identified, which unambiguously acts as a periplasmic chaperone and which could be used to optimize the expression yield of a periplasmic protein in functional form.
Thus, the technical problem underlying the present invention is to identify factors which increase the expression yield of periplasmic proteins in functional form in bacteria and to apply these factors to the optimization of expression of periplasmic proteins. The solution to the above technical problem is achieved by providing the embodiments characterized in the claims. Accordingly, the present invention allows to identify and to apply nucleic acid sequences encoding (poly)peptides which increase the expression yield of periplasmic proteins in functional form, and/or to identify and apply the (poly)peptides. The technical approach of the present invention, i.e. the co-expression of a collection of (poly)peptides with said periplasmic protein in a collection of host cells to screen or select for such nucleic acid sequences and/or (poly)peptides is neither provided nor suggested by the prior art.
Thus, the present invention relates to a method for obtaining a nucleic acid sequence comprising a (poly)peptide coding sequence, which increases the expression yield of a periplasmic protein in functional form in bacteria upon co-expression of said periplasmic protein and said (poly)peptide, comprising the steps of:
(a) providing a collection of host cells wherein each cell contains
(i) a first nucleic acid sequence out of a collection of nucleic acid sequences, and
(ii) a second nucleic acid sequence encoding said periplasmic protein;
(b) causing or allowing expression of
(i) (poly)peptides expressible from said collection of nucleic acid sequences, and
(ii) said periplasmic protein expressible from said second nucleic acid sequence;
(c) screening or selecting for a host cell expressing said periplasmic protein with increased functional yield;
(d) optionally, repeating step (c) one or more times;
(e) obtaining said first nucleic acid sequence contained in said host cell.
The term xe2x80x9cobtaining a nucleic acid sequencexe2x80x9d as used herein includes the at least partial identification of the nucleic acid molecule e.g. by sequencing and/or collecting the nucleic acid molecules by biochemical techniques, for example, comprised in a vector.
In the context of the present invention, the term xe2x80x9c(poly)peptidexe2x80x9d relates to molecules consisting of one or more chains of multiple, i.e. two or more, amino acids linked via peptide bonds. The term xe2x80x9cproteinxe2x80x9d refers to (poly)peptides where at least part of the (poly)peptide has or is able to acquire a defined three-dimensional arrangement by forming secondary, tertiary, or quaternary structures within and/or between its (poly)peptide chain(s). This definition comprises proteins such as naturally occurring or at least partially artificial proteins, as well as fragments or domains of whole proteins, as long as these fragments or domains have a defined three-dimensional arrangement as described above. The term xe2x80x9cperiplasmic proteinxe2x80x9d relates to proteins which, after biosynthesis in the cytoplasm, are transported across the inner membrane into the periplasm. This definition comprises proteins which remain in soluble or associated form in the periplasm, which are inserted in the inner or outer membrane, which are further secreted into the medium or which are assembled into complex structures such as filamentous phages particles which are then secreted. The periplasmic proteins will normally, but not necessarily, have at least a transport signal which directs the protein to the periplasm. The term xe2x80x9cperiplasmic protein in functional formxe2x80x9d relates to a periplasmic protein, which has a defined function, and which folds during and after expression in a way which leads to a defined three-dimensional arrangement required for the protein to be functional. A xe2x80x9cdefined functionxe2x80x9d according to the present invention is any feature of the protein which depends on the correctly folded three-dimensional arrangement, and which can be detected or determined. This comprises functions such as enzymatic activity or binding to a target or binding partner, such as in the case of receptor/ligand or antibody/antigen pairs. In addition, in the context of the present invention, said xe2x80x9cfeaturexe2x80x9d referred to hereinabove may be the presence of the correctly folded three-dimensional arrangement itself, detected or determined, for example, by an antibody recognizing the correctly assembled three-dimensional arrangement of the protein, or by measuring physico-chemical properties such as fluorescence or xcex1-helix content in fluorescence or CD spectra, respectively.
The term xe2x80x9cexpression yield of a periplasmic protein in functional formxe2x80x9d relates to the amount of a periplasmic protein being produced in functional form on expression. The term xe2x80x9c(poly)peptides expressible from said first nucleic acid sequencesxe2x80x9d relates to (poly)peptides for which open reading frames (ORFs) exist on said first nucleic acid sequences and where preferably the operator elements necessary for expression are present on the corresponding vectors comprising said nucleic acid sequences. In the case that said nucleic acid sequences comprise fragments of genomic DNA, more than one ORF may be comprised in anyone of said nucleic acid sequences. The term xe2x80x9cfunctional yieldxe2x80x9d relates to the amount of said periplasmic protein being produced in functional form. Methods of designing, creating or obtaining nucleic acid sequences for expression, of constructing appropriate vectors, inserting nucleic acid sequences into vectors, choosing appropriate host cells, introducing vectors into host cells, causing or allowing expression of (poly)peptides or protein, isolating nucleic acids from host cells or identifying nucleic acid sequences and corresponding protein sequences are standard methods (Sambrook et al., 1989) which are well known to anyone of ordinary skill in the art.
In a preferred embodiment, the method of the present invention further comprises the step of identifying a (poly)peptide coding sequence comprised in said first nucleic acid sequence. The term xe2x80x9cidentifying a (poly)peptide coding sequence comprised in said first nucleic acid sequencexe2x80x9d relates to the situation referred to hereinabove, where more than one ORF is present in said first nucleic acid sequence. When more than one ORF is found, the identifcation optionally further comprises the analysis of individual ORFs and, if necessary, further testing such as repeating steps (a) to (e) with a set of nucleic acid sequences separately representing the individual ORFs. Said further testing can be performed by anyone of ordinary skill in the art.
In a futher preferred embodiment, said periplasmic protein is not expressible, or in very low yields, in functional form when expressed under standard conditions, i.e. without the co-expression of said (poly)peptides.
In another embodiment, the present invention relates to a method, wherein said periplasmic protein is a resistance marker, a nutritional marker, a reporter protein, a transactivator of transcription of marker genes or reporter genes, or a protein binding to a target. As has been stated hereinabove in step (c), the functional yield is determined by screening or selecting for an increase in protein function. If the protein is a periplasmic resistance marker such as xcex2-lactamase , or zeocin causing resistance to a certain antibiotic when functionally present in the periplasm, a selection is possible by culturing the host cells in the presence of said antibiotic. Host cells expressing the marker in functional form will be selected for. If the protein is a periplasmic nutritional marker such as maltose-binding protein or an amino-acid-binding protein, a selection is possible by using auxotrophic host cells and by culturing the cells in the presence of maltose, or the amino acid, respectively. Host cells expressing the marker in functional from will be selected for. If the protein is a periplasmic reporter protein such as alkaline phosphatase, a screening is possible by culturing the host cells in the presence of the corresponding substrate resulting in a colour reaction. Host cells expressing the reporter protein in functional form will be selected for. If the protein is a secreted protein having enzymatic activity or binding to a target, a screening of the supernatant of individual cell cultures or of a collection of host cells on a plate can be performed by adding the appropriate substrate or target, respectively, to the medium and measuring or determining the amount of functional protein being secreted. It will be possible for a person of ordinary skill in the art, without undue burden, to identify and adapt existing screening or selection protocols, e.g. based on the various ELISA formats known, to arrive at protocols which are suitable for the indidual proteins and the corresponding function to be screened of selected for.
In a further preferred embodiment, said first nucleic acid sequence is or is derived from genomic DNA or mRNA of an organism, or cDNA.
Further preferred is a method, wherein said genomic DNA is randomly fragmented. Genomic DNA can be fragmented by use of restriction enzymes or DNA cleaving enzymes, chemical cleavage, mechanical shearing or sonification. These are standard procedures well known to anyone of ordinary skill in the art (Sambrook et al., 1989).
In a yet further preferred embodiment of the present invention, said first nucleic acid sequence comprises an at least partially randomized sequence. Such at least partially randomized sequences can be generated in various ways well known to the practitioner in the field, e.g. by random DNA syntheses using mixtures of mononucleotides or trinucleotides (Virnekxc3xa4s et al., 1994). There are numerous examples of collections of nucleic acid sequences encoding random peptide or antibody libraries which could be used in accordance with the present invention.
In a further preferred embodiment, the present invention relates to a method, wherein
(a) said first nucleic acid sequence is comprised in a vector which can be packaged in a filamentous phage particle, and
(b) said periplasmic protein is a fusion protein of at least part of a filamentous phage coat protein and a further protein;
and wherein in the course of said expression a collection of filamentous phage particles displaying said further protein is produced from said collection of host cells.
The term xe2x80x9cfilamentous phage particles displaying said further proteinxe2x80x9d refers to particles prepared by the phage display method which has been developed and used extensively in the past 10 years. In said method, a foreign (poly)peptide or protein is genetically fused to a coat protein of a phage, in most cases of a filamentous phage such as M13, f1 of fd, whereby said phage displays said foreign (poly)peptide or protein at its surface. Many important aspects of phage display are summarized in various publications (e.g. Kay et al., 1996).
In one further embodiment of the present invention, the vector wherein said first nucleic acid sequence is comprised is a phage vector or a phagemid vector. In the latter case, a helper phage will be used to supply phage proteins not encoded on the phagemid vector.
In another embodiment of the present invention, the phage coat protein is the gVlp, gVlllp or preferably glllp.
In a preferred embodiment of the present invention, binding of the displayed protein to a cognate binding partner is screened or selected for. If the protein is an antibody, the cognate binding partner is the corresponding antigen (and vice versa). In the case of a receptor, the cognate binding partner is its ligand (and vice versa). The particular advantage of this embodiment of the method of the present invention is that rare events leading to an increase in functional yield can be selected for since the selected phage particles can be used for infection of host cells and can thus be amplified.
In yet another embodiment, said screening or selection is for activity of the displayed further protein.
If the activity is an enzymatic activity, the supernatant of individual host cell cultures can be used to assay for the enzymatic activity.
In a still further embodiment, said further protein comprises at least a domain of the immunoglobulin superfamily, and preferably of the immunoglobulin family. In the context of the present invention, the term immunoglobulin superfamily (IgSF) refers to a family of proteins which are characterized by having at least a domain with the immunoglobulin fold, said superfamiliy comprising the immunoglobulins or antibodies, and various other proteins such as T-cell receptors or integrins. in a most preferred embodiment, said further protein is an immunoglobulin fragment taken from the list of Fv, scFv, disulphide-linked Fv, and Fab fragments. In this context, the term xe2x80x9cFvxe2x80x9d refers to a fragment comprising the VL (variable light) and VH (variable heavy) portions of the antibody molecule, a xe2x80x9csingle-chain Fvxe2x80x9d is a fragment, in which the VL and VH chains are joined, in either a VL-VH, or VH-VL orientation, by a peptide linker. A xe2x80x9cdisulphide-linked Fvxe2x80x9d is a fragment stabilized by an inter-domain disulphide bond. This is a structure which can be made by engineering into each chain a single cysteine residue, wherein said cysteine residues from two chains become linked through oxidation to form a disulphide. The term xe2x80x9cFabxe2x80x9d refers to a complex comprising the VL-CL (variable and constant light) and VH-CH1 (variable and first constant heavy) portions of the antibody molecule.
In yet a further preferred embodiment, the invention relates to the method wherein said first and second nucleic acid are encoded on the same or on different vectors.
In a still further embodiment, the present invention relates to a method for identifying a (poly)peptide which increases the expression yield of a periplasmic protein in functional form in bacteria upon co-expression of said periplasmic protein and said (poly)peptide, comprising the steps of:
(a) identifying a nucleic acid sequence or a (poly)peptide coding sequence according to a method of the invention as outlined hereinabove, and
(b) deducing a (poly)peptide therefrom.
The deduction of a (poly)peptide can be achieved by translating the (poly)peptide encoding sequence into an amino acid sequence. By comparing the deduced (poly)peptide sequence with published protein sequences, or by comparing the (poly)peptide coding sequence identified as described above with published nucleic acid sequences, larger (poly)peptides, or (poly)peptide coding sequences, respectively, can be deduced and identified in cases where said first nucleic acid sequence did not comprise the full-length nucleic acid coding sequence of a protein. In addition to the method described hereinabove, the (poly)peptide may be identified directly by known methods from the host cells screened or selected for. For example, said (poly)peptide may be expressed as a fusion with a detection or labelling tag. The tagged (poly)peptide may be isolated and identified by amino acid sequencing.
In a most preferred embodiment, the present invention relates to a method for increasing the expression of a periplasmic protein in functional form in a bacterial host cell, characterized by co-expressing said periplasmic protein and a (poly)peptide identified by a method according to the the present invention. Preferably, said bacterial host cells are E. coli cells.
In a further preferred embodiment, said periplasmic protein is not expressible, or in very low yields, in functional form when expressed under standard conditions, i.e. without the co-expression of said (poly)peptides.
In a yet further preferred embodiment, said periplasmic protein is a member of a collection of periplasmic proteins expressed in a collection of host cells. Several methods such as the phage display technology referred to hereinabove provide libraries of proteins for screening or selection procedures. However, the success of the procedures is limited by differences in expression yields of functional library members. For example, in the case antibody fragments, it is known that the expression yields of fragments in functional form vary to a large extent. A high percentage of fragments comprised in antibody fragment libraries derived from immunoglobulin repertoires is found not to be expressible, or in very low yield when expressed under standard conditions, i.e. without the co-expression of said (poly)peptides.
When expressing periplasmic proteins with yet unknown biological function, or a collection of periplasmic proteins for the identifcation or a member with a certain property (e.g. when expressing an antibody fragment library with the goal to identify a fragment which binds to a pre-defined target), the term xe2x80x9cexpression . . . in functional formxe2x80x9d refers to structural features rather than to a defined biological function. In that context, a protein can be called xe2x80x9cfunctionalxe2x80x9d when it folds into a three-dimensional arrangement representative for that kind of proteins. For example, when expressing a collection of antibody molecules of fragments thereof, a xe2x80x9cexpression . . . in functional formxe2x80x9d is achieved when said molecules of fragments are expressed in a correctly folded form, the so-called immunoglobulin fold, since correct folding of an antibody binding site is a prerequisite for its function, i.e. the binding to a target.
In a further preferred embodiment, said (poly)peptide is the E. coli protein Skp or a homolog thereof.
In a further preferred embodiment, said (poly)peptide is the E. coil protein FkpA or a homolog thereof.
Proteins are termed homologous if the percentage of the sum of identical and/or similar residues exceeds a defined threshold. This threshold is commonly regarded by those skilled in the art as being exceeded when at least 15% of the amino acids in the aligned genes are identical, and at least a further 30% are similar. Similarity in that context refers to the physico-chemical properties of the amino acids, such as e.g. size, polarity, or charge.
Proteins which are homologous to Skp are known from organisms such as Salmonella typhimurium (Koski et al., 1990 ; Koski et al., 1989), Yersinia enterocolitica (Hirvas et al., 1991), Yersinia pseudotuberculosis (Vuorio et al., 1991), Haemophilus influenzae (Fleischmann et al., 1995) and Pasteurella multocida (Delamarche et al., 1995). Proteins which are homologous to FkpA are present e.g. in many pathogenic bacteria (Horne and Young, 1995). In Legionella pneumophila the corresponding protein showing PPl activity is called MipA.
In a yet further preferred embodiment, the invention relates to a method wherein said periplasmic protein is a fusion protein of at least part of a filamentous phage coat protein and a further protein.
Still further preferred is a method wherein said further protein comprises at least a domain of the immunoglobulin superfamily, and preferably of the immunoglobulin family.
Most preferably, the invention relates to a method wherein the further protein is an immunoglobulin fragment taken from the list of Fv, scFv, disulphide-linked Fv, and Fab fragment.
In yet a further preferred embodiment, the invention relates to the method wherein the nucleic acid sequence encoding said (poly)peptide, preferably Skp, FkpA, or a homolog of Skp or FkpA, and the gene encoding said periplasmic protein are encoded on the same or on different vectors, or wherein the nucleic acid sequence encoding the (poly)peptide, preferably Skp, FkpA, or a homolog of Skp or FkpA, is integrated in the genome of the bacterial host.