The ability to clone and express a polypeptide of interest in large amounts has become increasingly important. The ability to produce and purify high levels of proteins is in particular important in the human pharmaceutical and biotechnological field, for example for producing protein pharmaceuticals as well as in the basic research setting, for example for crystallizing proteins to allow the determination of their three dimensional structure. Proteins that are otherwise difficult to obtain in quantity can be over-expressed in a host cell and subsequently isolated and purified.
The choice of an expression system for the production of recombinant proteins depends on many factors, including cell growth characteristics, expression levels, intracellular and extracellular expression, post-translational modifications and biological activity of the protein of interest, as well as regulatory issues and economic considerations in the production of therapeutic proteins. Key advantages of mammalian cells over other expression systems such as bacteria or yeast are the ability to carry out proper protein folding, complex N-linked glycosylation and authentic O-linked glycosylation, as well as a broad spectrum of other post-translational modifications. Due to the described advantages, eukaryotic and in particular mammalian cells are currently the expression system of choice for producing complex therapeutic proteins such as monoclonal antibodies.
The most common approach to obtain high expressing host cells (also called high producers) is to generate an appropriate expression vector for expressing the product of interest as a first step. The expression vector drives the expression of the polynucleotide encoding the product of interest in the host cell and usually comprises at least one selectable marker for generating the recombinant cell line. Expression vectors used for expressing a polypeptide in a host cell usually comprise besides the polynucleotide encoding the protein of interest transcriptional control elements suitable to drive transcription such as e.g. promoters, enhancers, polyadenylation signals, transcription pausing or termination signals as element of an expression cassette. Furthermore, suitable translational control elements are usually included and operably linked to the polynucleotides to be expressed, such as e.g. appropriate 5′UTRs and 3′ UTRs.
To increase the efficiency of such an expression system, different elements are optimized, especially the DNA sequences which contribute to the efficiency of transcription and translation, protein synthesis, correct folding in ER and protein secretion. High yielding expression systems without optimized translation and secretion components could potentially lead to mRNA instability, insufficient protein secretion, and miss-folded (inactive) protein accumulation in the cell cytosol or membrane. Therefore, expression systems enabling stable and consistent translation and secretion of correctly folded proteins into the cell culture medium are of particular interest. Such secretory systems offer the advantages of a stable and efficient mRNA translation; correct protein folding and efficient secretion, simple and fast product purification procedures, as well as an increased yield compared to cytosolic systems. However, the product yields of the majority of the available secretory systems are not yet fully optimized. To improve the productivity and secretion efficiency, one aim is to optimize the secretion signals (also referred to herein as signal peptide or secretory leader sequence), as well as their combination with different 5′UTR sequences in order to obtain a combination of genetic elements that results in the desired high level expression.
The majority of secreted and membrane-bound proteins from either prokaryotic or eukaryotic organisms possess an amino-terminal leader peptide (also referred to as secretory leader sequence or signal peptide) that is cleaved from the nascent precursor polypeptide during biosynthesis. Secretory leader peptides are usually 5 to 60 amino acids long. This sequence is necessary and sufficient for secretion. Analysis of a large number of these secretory leader peptides has revealed a common structural motif that occurs in the absence of significant amino acid sequence homology [Von Heijne, 1981; Perlman et al, 1983]. In general, a secretory leader sequence consists of a positively charged amino terminus (n), a hydrophobic core (h) and a more polar carboxy terminus (c) that defines the signal peptidase cleavage site. The “n” region of the secretory leader peptide is about 5 to 8 amino acids long and is characterized by the presence of basic residues. The “h” region contains 8 to 12 non-polar amino acids that are composed in average of 37% leucine, 15% alanine, 10% valine, 10% phenylalanine, 7% isoleucine and 21% hydrophobic amino acids such as glycine, methionine, proline or trytophane. This region has a high propensity for alpha-helix formation, a conformation which may facilitate interaction with the interior of the lipid bilayer. Studies on the structural features of secretory leader peptides, primarily based on bacterial proteins, have suggested that the “h” region is critical to signal sequence function [Gierasch, 1990]. Disruption of the h region by deletion or by replacement of hydrophobic residues with hydrophilic or charge amino acids leads to loss of signal function, whereas alterations to the “n” region have little effect [Bird et al, 1990]. The carboxy terminus, or cleavage region, is typically about 6 amino acids long. This region is involved in signal peptidase recognition and cleavage, which is usually required to achieve final folding and secretion of the protein.
The 5′ UTR (5′ untranslated region) is a part of a DNA sequence that is transcribed into mRNA, but not into protein. It usually begins at the transcription initiation, and ends one nucleotide before the start codon. A 5′UTR may contain sequences that regulate the translation efficiency or mRNA stability, binding sites for proteins, regulatory elements, and sequences that promote the initiation of translation. 5′UTR sequences can vary in their length and may comprise a few tenths of nucleotides up to few hundreds or even several thousand nucleotides. In eukaryotes, the median length of the 5′UTR is approximately 100 to 200 nt. The specific role of the 5′UTR and its elements has not been fully elucidated yet, partially also because the sequence is not translated into functional protein. However, it is known that the combination of a specific 5′UTR and a specific secretory leader sequence can strongly improve the translation and secretion efficiency of the production system and thus may increase the expression yield. However, as a plethora 5′UTRs and secretory leader sequences are available, it is a challenge to obtain an efficient combination that indeed improves the expression. Thus, despite the plethora of available expression cassettes and expression vectors, obtaining a robust polypeptide/protein production with a high yield in eukaryotic cells is still challenging.
Therefore, it is the object of the present invention to provide an expression cassette that enables the secretion of a polypeptide of interest with high yield when said expression cassette is introduced into a host cell. Furthermore, it is an object of the present invention to provide an expression vector that allows for the expression of a polypeptide of interest with high yield. Furthermore, it is an object of the present invention to provide a method suitable for expressing a polypeptide of interest with high yield.