The invention relates to the signal peptide of a protein from Bordetella pertussis which is able to direct heterologous proteins into the periplasmic space between the inner and outer membranes of Gram-negative species of bacteria. The invention additionally relates to DNA sequences which code for this signal peptide, to plasmids which contain a gene structure of this type, and to host organisms with plasmids of this type. The invention furthermore relates to plasmid vectors with whose aid it is possible to determine and compare the efficiency of known and new signal sequences. It is possible as a consequence of such comparative study for particularly efficient signal sequences to be identified, cloned and used in all three possible translation reading frames for the expression of heterologous proteins.
It is possible in principle to distinguish between two different types of signal sequences: a "hydrophobic" type and a "hydrophilic" type. The "hydrophobic" group of signal sequences usually comprises about 13-30 amino acids, whereas the "hydrophilic" group comprises about 12-70 amino acids. The signal sequence of the "hydrophobic" type can be divided into three structural elements. It is composed of a relatively hydrophilic NH.sub.2 terminus with one or two basic amino acids, of an apolar, mostly hydrophobic block of seven or eight amino acids, and of a relatively hydrophilic COOH terminus which is terminated by an amino acid with a small side-chain. Such "hydrophobic" signal sequences guide proteins through the membrane of the endoplasmic reticulum (ER) and through bacterial membranes. Although bacterial and ER signal sequences differ slightly from one another, they are functionally interchangeable. The structure of the "hydrophilic" type differs greatly from that of the abovementioned "hydrophobic" type: there are no lengthy uninterrupted sections of hydrophobic amino acids in the "hydrophilic" type, but there are usually many basic and hydroxylated amino acids and few or no acidic amino acids. The "hydrophilic" type of signal sequences guides proteins into mitochondria, chloroplasts and, possibly, into peroxisomes too. It has no significance for the present invention.
Although, as shown above, the "hydrophobic" type of signal sequences of prokaryotic and eukaryotic origin have common characteristics and may be functionally interchangeable, there are also observable differences: thus, most of the prokaryotic signal sequences hitherto known have, by comparison with the "hydrophobic" type (=ER type) of eukaryotic signal sequences, a lower hydrophobicity in the apolar section plus, usually, an additional basic amino acid in the NH.sub.2 region. This is possibly the reason why the natural signal sequence of a heterologous protein is usually less efficiently recognized and processed in microorganisms than is a bacterial signal sequence preceding this protein.
The secretion of a heterologous protein in E. coli usually takes place as transport through the inner membrane into the periplasmic space; only a few exceptions in which heterologous proteins are secreted into the surrounding medium are known. The transport of a heterologous protein in to the periplasmic space in E. coli substantially corresponds functionally to the transport of a protein into the lumen of the endoplasmic reticulum of eukaryotic cells. It is possible as a consequence of this process for proteins to be correctly folded and for intramolecular disulfide bridges to be correctly produced in E. coli too. The signal sequence is eliminated by proteolysis by specific signal peptidases, and thus the mature, "processed" heterologous protein is synthesized in E. coli.
Some proteins are unstable after cytoplasmic expression in bacteria, for example Escherichia coli, and are very rapidly broken down again by proteases. This breakdown can be prevented by, inter alia, these proteins being, owing to a preceding, very efficient signal sequence, rapidly secreted into the periplasmic space. Hence the object was to isolate particularly efficient signal sequences and to design processes suitable for this.
Hoffman and Wright (Proc. Acad. Natl. Sci. USA; (1985) 82, 5107-5111) describe plasmids which code for the periplasmic alkaline phosphatase from E. coli (PhoA, EC 3.1.3.1) without the signal sequence belonging thereto. In in vitro fusions with fusion partners with their own signal sequence there is now secretion of active alkaline phosphatase in the form of a fusion protein, whereas when there is no fused-on signal sequence there is no detectable activity for the alkaline phosphatase released into the cytoplasm. Manoil and Beckwith (Proc. Natl. Acad. Sci. USA (1985) 82, 8129-8133) continued this work by placing the cDNA coding for PhoA without a signal sequence and 5 subsequent amino acids on the 3' side in front of the transposon Tn5 (loc. cit.) and were thus able to show that fusions not only with secreted proteins but also with membrane proteins result in active PhoA. The said construct "TnPhoA" is consequently suitable for identifying signal sequences or structures resembling signal sequences.
S. Knapp and J. Mekalanos (J. Bacteriology (1988) 170, 5059-5066) have now generated, by means of TnPhoA mutagenesis, mutants in Bordetella pertussis which are influenced by modulation signals (in this case nicotinic acid and MgSO.sub.4), with the majority of these mutants being repressed and some being activated, which suggests that there are at least two trans-acting regulatory genes.
We have found that the mutant SK6 mentioned therein contains a new and very efficient signal sequence.
This new signal sequence belongs to a secretory protein from Bordetella pertussis and has the following sequence (cf. Tab. 2 and 3) EQU MKKWFVAAGIGAAGLMLSSAA
Also described are PhoA-containing plasmids which, on the one hand, are very well suited as "signal-sequence cloning vectors" and, on the other hand, make it possible to compare quantitatively various signal sequences in terms of their "secretion efficiency". Particularly useful for both purposes is the vector pTrc99C-PhoA (FIG. 1, Tab. 1 and Example 2). This vector has been constructed from pTrc99C (Amann et al. Gene 69 (1988) 301-315) and from a PhoA DNA which has been modified to that effect and has no signal peptide sequence, in such a way that the structural gene for PhoA is located in the correct reading frame with respect to the translation initiation codon of pTrc99C, and an NcoI cleavage site has been generated directly at the 5' end of the PhoA structural gene (without signal sequence).