In recent years, developments in recombinant DNA technology have made it possible to express a wide range of cloned foreign genes in host organisms such as bacteria and yeasts. Two main approaches have been employed.
In one approach, expression of the foreign gene has been placed under the direct control of host expression control sequences, e.g. an E. coli promoter and Shine-Delgarno sequence, to yield non-fused foreign protein and polypeptide products. However this approach has various shortcomings.
High level expression in E. coli of many eucaryotic genes has proved difficult even when a strong promoter, such as the E. coli .lambda. P.sub.L or Trp promoter and the Shine-Delgarno sequence from a highly expressed E. coli gene have been used in front of the foreign gene sequence. These difficulties apparently arise because the secondary structure of the mRNA in the vicinity of the Shine-Delgarno sequence affects the accessibility of mRNA to the ribosome and consequently the translational efficiency. Since the secondary structure depends on the sequence which follows the initiation codon, i.e. the foreign gene sequence, such constructions often result in poor translational efficiency.
Also many proteins expressed in E. coli have an extra methionine amino acid residue at their N-terminus, arising from the ATG initiaton codon at the 5' end of the foreign gene which is required to initiate translation. The presence of this extra N-terminal methionine is undesirable as it may affect the stability and activity of the protein and, if the protein is to be used clinically, may cause antigenicity problems.
Furthermore, directly expressed foreign gene products, in particular when they are relatively small polypeptides such as some hormones, are often subject to proteolytic turnover within the host organism cells. The leads to very low levels of accumulation of the foreign gene product within the host cells.
In an alternative approach, many eucaryotic proteins have been produced in large amount in E. coli in the form of hybrid fusion proteins obtained by fusing the foreign gene sequence to the coding sequence of a highly expressed E. coli gene, such as the lacZ, tufB, bla, .lambda.CII and .lambda.N genes. In such constructions, run-on of translation from the bacterial gene provides high translational efficiency. Furthermore, the presence of bacterial protein fused to the foreign gene product may render the fusion protein resistant to proteolytic turnover and may also provide for compartmentalisation of the fusion protein within the host cells or its secretion therefrom. Also by fusion protein expression, potentially biohazardous materials such as peptide hormones may be produced in an inactive `pro-form` which may then be activated subsequently in vitro by specific cleavage.
However such hybrid fusion proteins themselves are not normally suitable as end products, e.g. for clinical use, and it is necessary to cleave specifically the fusion protein to release the foreign gene product in native form. Specific single or double amino acid cleavage sites have been provided within fusion proteins at the junction between the E. coli protein and the eucaryotic protein. For instance cyanogen-bromide chemical treatment has been used to cleave at single methionine amino acid cleavage sites and trypsin enzymatic treatment has been used to cleave at single arginine or lysine or double arginine--arginine or lysine--lysine cleavage sites. However, such single or double repeated amino acid cleavage sites are of only limited applicability as, if the cleavage site amino acids are present within the foreign gene product amino acid sequence, cleavage treatment will lead to unwanted cleavage of the foreign protein as well as cleavage at the junction of the fusion protein.
EP-A-0 035 384 (The Regents of the University of California) describes the use of specific cleavage linkers at the junction between host and foreign DNA sequences in the construction of recombinant DNA sequences which code for fusion proteins. These include cleavage linkers which code for extended specific cleavage sequences which comprise a sequence of at least two different amino acids which provide a specific enzyme cleavage site. The greater the number of amino acid residues in the specific cleavage sequence, the smaller is the probability of a similar sequence occurring within the foreign gene product amino acid sequence, and thus the lower is the risk of there being unwanted cleavage of the foreign protein. EP-A-0 035 384 specifically describes the use of a cleavage linker having the sequence X--(Asp).sub.n --Lys--Y, where n=2-4, which is cleaved on the carboxyl side of Lys specifically by enterokinase. However, the cleavage sites described in EP-A-0 035 384 are not completely satisfactory for use in the cleavage of fusion proteins. For instance, it has been shown (Anderson et al, Biochemistry 16, 3354- (1977))that enterokinase cleaves prococoonase at the peptide bond following the sequence Gly--Glys--Lys, and thus it appears that enterokinase cleavage is not uniquely dependent upon the sequence X--(Asp).sub.n --Lysp--Y.