A number of heterologous cell expression systems have been developed to express and secrete recombinant proteins. In general, systems based on eukaryotic host cells are employed for the expression of eukaryotic proteins that require proper folding and post-translation modifications, thus allowing for production of “native-like” proteins. There are four primary eukaryotic host cell types that are typically utilized; fungal including yeast, insect, plant, and mammalian.
The choice of a recombinant protein expression system to employ is dependent on the desired application. The system of choice must meet key criteria such as proper folding and processing, consistency, and productivity (cost effectiveness) of the desired protein product (Schmidt, Appl. Microbiol. Biotechnol. (2004) 65:363-372). Insect cell-based expression systems have the potential to meet capacity requirements based on ease of culture, higher tolerance to osmolality and by-product concentrations during large scale culture, and generally higher expression levels (Ikonomou et al., Appl. Microbiol Biotechnol. (2003) 62:1-20). Recently the use of expression systems based on insect cells has become more common. These systems provide most of the characteristics desired of eukaryotic systems, but have added benefits such as lower cost of goods. Insect cell systems are either based on infection of host cells with insect virus vectors (e.g. baculovirus) or on the generation of stable cell lines by integration of expression plasmids into the genome of the host cells.
The baculovirus expression system (BES) has emerged as the primary insect cell culture system utilized for recombinant protein expression. This system is based on the use of vectors derived from the insect viruses known as baculovirus. These vectors are used to generate recombinant viruses that encode the desired protein product. The recombinant viruses are used to infect host insect cells that then express the desired recombinant proteins. While there are advantages to this system in regards to ease of cloning and “time to product”, there are also several disadvantages. The primary challenge in the use of BES is that it is based on the viral infection of the host cells. This results in cellular lysis and cell death 72-96 hrs post infection (Farrell et al., Biotech. Biogen. (1998) 60:656-663; Deo and Park, Biotechnol. Appl. Biochem. (2006) 43:129-135). As a result, during the late stages of infection the processing machinery of the insect cells is compromised to the extent that the processing of the desired product is also compromised. This limits the time that the cells can produce product and possibly more importantly leads to altered forms of the product being produced. Furthermore, the lysis of cells releases cellular enzymes that can also affect the quality of the desired product.
The use of stably transformed insect cells for the expression of recombinant proteins is an alternative to the use of BES. Expression systems based on stably transformed insect cell lines are non-lytic and provides for steady long term production of secreted products that require proper folding and post translational modifications. The secretion of the product into the culture medium provides a cleaner starting material for the purification process and allows for the final protein product to be purified with basic methods. This leads to products that are of higher quality (Kirkpatrick and Shatzman in Gene Expression Systems: Using Nature for the Art of Expression (1999) pp 289-330).
The Drosophila melanogaster cell expression system (“Drosophila expression system”) is an established heterologous protein expression system based on the use of expression vectors containing Drosophila promoters and Drosophila S2 cells (“S2 cells”) (Schneider, Embryol. Exp. Morph. (1972) 27:353-365). S2 cells are transformed with these vectors in order to establish stable cell lines expressing proteins corresponding to the heterologous sequences introduced into the vector (Johansen, H. et al., Genes Dev. (1989) 3:882-889; Ivey-Hoyle, M., Curr. Opin. Biotechnol. (1991) 2:704-707; Culp, J. S., et al., Biotechnology (NY) (1991) 9:173-177; U.S. Pat. Nos. 5,550,043; 5,681,713; 5,705,359; 6,046,025). This insect cell expression system has been shown to successfully produce a number of proteins from different sources. Examples of proteins that have been successfully expressed in the Drosophila S2 cell system include HIV gp120 (Culp, J. S., et al., Biotechnology (NY) (1991) 9:173-177; Ivey-Hoyle, M., Curr. Opin. Biotechnol. (1991) 2:704-707), human dopamine β-hydrolase (Bin et al., Biochem. J. (1996) 313:57-64), human vascular cell adhesion protein (Bernard et al., Cytotechnol. (1994) 15:139-144). In each of these examples, expression levels were greater than other expression systems that had been previously utilized.
In addition to high levels of expression, the Drosophila expression system has been shown to be able to express heterologous proteins that maintain native-like biological function (Bin et al., Biochem. J. (1996) 313:57-64), (Incardona and Rosenberry, Mol. Biol. Cell. (1996) 7:595-611). More recent examples have shown by means of X-ray crystallography studies that this expression system is capable of producing molecules with native-like structure (Modis et al., Proc. Natl. Acad. Sci. USA (2003) 100:6986-6991), (Modis et al., Nature (2004) 427:313-319), (Xu et al., Acta. Crystallogr. D Biol. Crystallogr (2005) 61:942-950). Two other recent publications have also demonstrated the ability of the Drosophila expression system to produce high quality products. In the first report, Schmetzer et al. (J. Immun. (2005) 174: 942-952) compares baculovirus-expressed EpCAM protein to Drosophila-expressed EpCAM protein for protein folding and native conformation. Specifically, BES-expressed EpCAM and Drosophila-expressed EpCAM were compared to denatured Drosophila-expressed EpCAM. It was determined that the BES-expressed EpCAM was in a partial folded state relative to the non-denatured and denatured Drosophila-expressed EpCAM protein. This indicates that the BES-expressed protein is in an incompletely folded state. The Drosophila-expressed EpCAM protein on the other hand adopted a more completely folded state. The authors of this paper considered the Drosophila-expressed protein to be in the “natural” state while the baculovirus-expressed protein was not. In the second report, Gardsvoll et al. (Prot. Exp. Purif. (2004) 34:284-295) demonstrate that the expression of the urokinase-type plasminogen activator receptor (uPAR) in S2 cells results in a more homogeneous product in regards to glycosylation (5 N-linked sites) than uPAR expressed in CHO cells.
Based on the body of work utilizing Drosophila S2 cells as host cells for the expression of heterologous proteins, it is clear that these cells have many characteristics that are desirable in an expression system. In surveying the published reports utilizing these cells to produce recombinant proteins, the expression level of the protein products is typically in the range of 5 to 50 μg/ml. To have a protein production system that can meet the demanding needs of biotech manufacturing, higher levels of expression are desirable. To achieve consistently higher expression levels would require the optimization of any or all of the following, host cells line, growth medium, or expression vectors.
The development of any heterologous protein expression system requires the assembly of various regulatory control elements (hereafter, alternatively called “regulatory elements” or “control elements”, or simply “elements”, including the singular form of each) into expression vectors that drive the expression of the desired recombinant protein product. These regulatory control elements include transcriptional activators and enhancers, transcriptional initiator and termination elements, translational start and stop elements, and secretion signal leader sequences. The five main regulatory control elements of an expression vector for the secretion of recombinant protein products are 1) proximal promoter, 2) core promoter 3) 5′ untranslated region 4) secretion signal peptide and 5) 3′ untranslated region. For each one of these elements any optimization must first be done independently. Once individual elements are optimized, they must be assembled and tested to ensure that the assembly of given elements is compatible and capable of directing efficient expression of the desired recombinant protein products. In the assembly of the various combinations of elements it is also important that the elements are “operably linked”. “Operably linked” refers to functional linkage between the various elements in a manner that retains the function of each individual elements as well as the function of the combined elements as many transcriptional and translational functions are the result of the processing from one element to the next. Therefore, “operably linked” means that the nucleic acid sequences of the various elements are linked and contiguous and, where necessary, are linked contiguously to maintain an appropriate protein encoding reading frame.
While many successful expression vectors have been developed, fully optimized systems are less common. Furthermore, most developed systems are based on the use of naturally occurring sequences, i.e. the various regulatory elements are taken from existing (naturally occurring) gene sequences. Over the past several years the use of synthetic sequences has been employed as a means to develop optimized expression vectors.
A “promoter” is composed of two basic parts, the core promoter and the proximal promoter. The promoter is located upstream of the coding sequence of a given gene. The core promoter is defined as the minimal nucleotide sequence that is capable of directing accurate transcriptional initiation of a given gene. The core promoter in eukaryotes is responsible for directing initiation by the RNA polymerase II complex. The core promoter is generally delineated as the sequence spanning the transcription initiation site (INR), more specifically the sequence 35 to 45 nucleotides upstream and downstream of the INR (total length of 70 to 90 nucleotides). The region bounded by the core promoter may contain one or more of the following conserved regulatory motifs, TFIIB recognition element (BRE), TATA box, initiator (INR), motif ten element (MTE), downstream promoter element (DPE), and downstream core element (DCE). Although certain nucleotide sequences within a regulatory control element have art-recognized names that include the word “element”, such sequences are herein called, in general terms, “motifs”; specific art-recognized names (e.g., BRE, MTE, DPE, and DCE) have the same meaning herein as in the cited references, even though they are called “motifs” herein, e.g., “MTE motif”. The role and composition of core promoters and their constituent individual motifs have been reviewed by Ohler et al. (Genome Biol., (2002) 3:1-12), Smale and Kadonaga (Ann. Rev. Biochem. (2003) 72:449-479), FitzGerald et al. (Genome Biol. (2006) 7:R53), Gershenzon et al. (BMC Genomics (2006) 7:161) and Juven-Gershon et al. (Biochem. Soc. Trans. (2006) 34:1047-1050). Studies defining the DPE motif (Kutach and Kadonaga, Mol. Cell. Biol. (2000) 20:4754-4764) and the MTE motif (Lim et al., Genes Dev. (2004) 18:1606-1617) have also been reported. While the core promoters of most genes contain one or more of these motifs in various combinations, there are a small percentage of genes that do not contain any of these motifs. In surveys of core promoters from several organisms it is clear that no universal core promoter, or universal subset of motifs comprising a core promoter, exists; however, the INR motif is the most common (FitzGerald et al., Genome Biol. (2006) 7:R53) motif found in core promoters.
The proximal promoter in eukaryotes is generally defined as the sequence that is immediately upstream of the core promoter. The length of the proximal promoter is highly variable. The proximal promoter is composed of transcriptional activator motifs that recruit transcription factors which in turn activate the polymerase initiation complex which is bound to the core promoter region. The nature and number of the transcriptional activator motifs is highly varied for a given promoter.
The optimization of promoters can be accomplished by systematic substitution and testing of various elements or motifs, the use of synthetic promoter libraries, or random substitution of individual regulatory elements or motifs. Examples of these different approaches for the development of optimized or synthetic promoter have been reported. In the work of Li et al. (Nature Biotechnol. (1999) 17:241-245) an approach of evaluating randomly assembled transcription factor binding sites to drive transcription of muscle specific promoters was reported. Edelman et al. (Proc. Natl. Acad. Sci. (2000) 97:3038-3043) describe a high-throughput selection procedure to select synthetic proximal promoters that enhance the transcriptional activity of a core promoter. Tornoe et al. (Gene (2002) 297:21-32) built a set of synthetic promoters for use in mammalian cells that combined viral and human promoter elements. The synthetic mammalian promoters were further optimized by substituting consensus sequences and randomizing other non-consensus sequences to obtain promoters with variable activity. In the development of synthetic promoters for gene expression in Lactobacillus, Rud et al. (Microb. (2006) 152:1011-1019) utilized consensus sequences for regulatory elements along with randomization of the spacer sequences between the regulatory elements to improve the performance of the promoter. In this manner they were able to identify synthetic promoters that had increased activity. In work by Juven-Gershon et al. (Nature Methods (2006) 3:917-922) the development of an optimized core promoter is achieved by combining core promoter motifs from different Drosophila and viral genes. This work centered on the use of the MTE motif (Lim et al., Genes Dev. (2004) 18:1606-1617) of the core promoter. This strategy resulted in core promoters with increased transcriptional activity. Based on these reports, it is clear that there is a need to experimentally determine what regulatory elements, and constituent motifs, constitute a functional promoter. This includes which combinations of motifs are used, in which order the motifs are assembled and in which orientations the motifs and elements are inserted. This is necessary whether the motifs or elements utilized are based on synthetic sequences or based on optimized sequences. While the optimization of promoters has resulted in improvements in expression, the promoter only represents one aspect of the regulatory elements needed in a fully functional and optimized expression vector.
The sequence of the 5′ untranslated region (5′UTR) of messenger RNA (mRNA) plays an important role in post-transcriptional regulation of gene expression from eukaryotic mRNA. The variability of sequences and the importance of various motifs and characteristics of the 5′UTR region of mRNA have been documented (Kozak, J. Mol. Biol. (1994) 235:95-110), and Kozak, Gene (2005) 361:13-37). The nature of the 5′UTR plays a role in message stability and translation efficiency. The stability and translatability of a given mRNA will impact the ability to effectively express recombinant proteins. Therefore, the design of expression vectors for the optimal production of recombinant proteins requires that the 5′UTR be evaluated for its ability to function in an efficient manner in a given host cell system. The 5′UTR as described is an important part of the mRNA which is encoded by the DNA sequence contained in the 3′ end of gene promoters. Hence, in the context of defining the DNA sequence of the promoter it is generally inclusive of 5′UTR sequence (the region from the INR to the initiator methionine codon).
The sequences of the 3′ untranslated region (3′UTR) of mRNA, along with the 5′UTR, play important roles in post-transcriptional regulation of gene expression. The nature of the 3′UTR plays a role in message stability, transport from the nucleus to the cytoplasm, and sub-cellular localization. Each of these factors can have an impact on the efficiency of translation of a given message and ultimately on the level of protein expression. Therefore, the design of expression vectors for the optimal production of recombinant proteins requires that the 3′UTR be evaluated for its ability to function in an efficient manner in a given host cell system.
Most proteins that are secreted from cells contain an N-terminal signal sequence that directs the protein into the cell's secretion pathway. In eukaryotic cells, the secretion signal or signal peptide interacts with the endoplasmic membrane to initiate the secretion process. The eukaryotic signal sequence has been divided into three structural regions, basic, hydrophobic, and polar, starting from the N-terminus and proceeding to the C-terminus respectively (von Heijne, Nuc. Acids Res. (1986) 14:4683-4690) and (Bendtsen et al., J. Mol. Biol. (2004) 340:783-795). Over the years numerous secretion signals have been identified and used to direct the secretion of recombinant proteins. Although many different signal sequences have been used and shown to be functional, few studies have been reported that define optimal sequences for a given cell type. The general characteristics and rules related to the three structural regions are well established, as detailed by von Heijne (Nuc. Acids Res. (1986) 14:4683-4690) and by Bendtsen et al. (J. Mol. Biol. (2004) 340:783-795), however, little comparative experimental data exist as to what constitutes an optimal secretion signal. Most published reports deal with the characterization and optimization of gram positive bacterial or yeast secretion signals (Le Loir et al., Microb. Cell Fact. (2005) 4:2 and Hofmann and Schultz, Gene (1991) 101:105-111). One report that describes the optimization of the IL-2 secretion signal clearly demonstrates the benefits of optimization (Zhang et al., J. Gene Med. (2005) 7:354-365).
The development of optimized expression vectors for use in insect cells to generate stable cell lines capable of producing large quantities of high quality recombinant proteins requires the identification of appropriate regulatory elements, including but not limited to the core promoter element, that can be used to drive transcription and translation of the heterologous proteins to be expressed. Furthermore, synthetic regulatory elements can be designed and utilized to further optimize the functionality of the expression vectors. The disclosures of Lim et al., Genes Dev. (2004) 18:1606-1617, Kutach and Kadonaga, Mol. Cell. Biol. (2000) 20:4754-4764), and Juven-Gershon et al., Biochem. Soc. Trans. (2006) 34:1047-1050 are limited to the core promoter element, and specifically to novel or heterologous sequences and/or spacing of the TATA box, INR, MTE, and DPE motifs. Full optimization of regulatory control of expression of recombinant proteins requires that multiple regulatory elements, not just motifs in the core promoter, be optimized.
While many regulatory elements are known for Drosophila as well as other insects, what constitutes optimal elements for the expression of heterologous proteins in insect cells is not known. Current technology and methods provide the potential to assemble regulatory elements into an expression vector based on the current body of knowledge. While the potential exist, it is common knowledge that not all attempts to do so result in success. The recombinant expression regulatory elements that work in one cell type do not always work in another cell type. For example, Olsen et al. (Cytotechnol. (1992) 10:157-167) evaluated the ability of a series of promoters to drive the expression of a heterologous protein in S2 cells and found that only the Drosophila MtnA resulted in microgram yields of product despite the fact that all of the promoters tested had been shown to work in other cell types. In another example, a Bombyx mori expression vector based on the IE promoters which works well in Lepidopteron cells (Farrell et al., Biotechnol. Bioeng. (1998) 60:656-663) fails to adequately drive expression of heterologous proteins in S2 cells (unpublished data). Therefore, a systematic evaluation is required to determine the potential to consistently express high levels of high quality heterologous proteins using S2 cells. In the biotechnology field, the ability to efficiently produce recombinant proteins at a favorable cost of goods is key to success. In order to achieve this goal using Drosophila S2 cells, further development of suitable expression vectors is needed.
The combination of multiple regulatory elements in an appropriate manner such that an additive benefit is achieved can further enhance the utility of the expression vector. Therefore, the technical problems to be solved are: (1) identification of regulatory elements for inclusion in expression vectors that are capable of driving expression of large quantities of high quality recombinant proteins in insect cells, (2) the design of synthetic versions of functional regulatory elements that have improved function, and (3) determining the optimal combination of multiple regulatory elements such that the combination results in an additive increase in the productivity of the protein expression. Further improvements in stable insect expression systems could potentially provide new platforms for the manufacture of proteins where large quantities of high quality protein are needed, such as in cell based systems for production of subunit vaccines against infectious diseases, for example influenza, or organisms with bioterrorism potential.