The present invention relates to an expression system which provides heterologous proteins expressed by a non-native host organism but which have native-protein-like biological activity and/or structure.
Advances during the past decade in molecular biology and genetic engineering have made it possible to produce large amounts of protein products using heterologous expression systems.
The use of heterologous hosts for production of, for example, therapeutic proteins, can lead, however, to differences in the biological and/or structural properties of the recombinant product. Amongst the biochemical modifications that commonly occur to proteins during or following their synthesis in the cell, the formation of disulphide bonds is of relevance since this modification is coupled to the correct folding or assembly of disulphide-bonded proteins (reviewed by J. C. A. Bardwell and J. Beckwith, Cell, 74: 769-771, 1993; R. B. Freedman, in Protein Folding, T. E. Creighton (ed.), W. H. Freeman and Co., New York, pp. 455-539, 1992).
In bacteria and other host cells for example, under certain conditions, some heterologous proteins are precipitated within cells as xe2x80x9cretractilexe2x80x9d or xe2x80x9cinclusionxe2x80x9d bodies. Such refractile or inclusion bodies consist of dense masses of partially folded, reduced heterologous protein which is often in a form which is not biologically active (S. B. Storrs et al., Protein Foldingxe2x80x94American Chemical Society Symposium Series 470, Chapter 15: 197-204, 1991). It is believed that the biological inactivity of natively-disulphide-bonded refractile or inclusion heterologous proteins is due to incorrect protein folding or assembly brought about by the non-formation or misformation of the disulphide bonds within the proteins. The biological inactivity of refractile or inclusion heterologous proteins due to the process of incorrect protein folding or assembly is believed to occur either before or after intracellular precipitation or during isolation of the proteins.
Moreover, very often the biological function of a protein is regulated or at least influenced by the state of oxidation of its sulphydryl groups. This is the case for some enzymatic activities where the reversibility and timing of oxidation of sulphydryl groups has been proposed as a physiological control mechanism.
There are numerous examples of disulphide-bonded proteins in the literature. For instance, most viral glycoproteins and some growth factors are known to be disulphide-bonded. In general, disulphide bonds are essential to correct protein folding. Examples of disulphide-bonded recombinant heterologous proteins that have been shown to be misfolded when expressed in, for example, yeast cells include hepatitis B virus large surface protein (Biemans et al., DNA Cell Biol., 10: 191-200, 1991), xcex1-1-antitrypsin (Moir and Dumais, Gene, 56: 209-217, 1987), and erythropoietin (Elliott et al., Gene, 79: 167-180, 1989). Examples of recombinant proteins expressed in, for example, insect or mammalian cells, for which disulphide bonds have been shown to be essential for correct protein folding, include granulocyte/macrophage colony stimulating factor (GM-CSF) (Kaushansky et al., Proc. Natl. Acad. Sci. USA, 86: 1213-1217, 1989), Friend erythroleukaemia virus (SFFV) glycoprotein gp55 (Gliniak et al., J. Biol. Chem., 266: 22991-22997, 1991), glycoprotein of vesicular stomatitis virus (VSV-G) (Grigera et al., J. Virol., 66: 3749-3757, 1992), pulmonary surfactant protein D (Crouch et al., J. Biol. Chem., 269: 15808-15813, 1994), low density lipoprotein (LDL) receptor (Bieri et al., Biochemistry, 34: 13059-13065, 1995), insulin-like growth factor (Nahri et al., Biochemistry, 32: 5214-5221, 1993), and angiotensin-converting enzyme (ACE) (Sturrock et al., Biochemistry, 35: 9560-9566, 1996). It should be noted that in all these cases, heterologous protein expression in particular host cells was only used to produce sufficient quantities of the protein concerned to enable structural studies to be carried out.
Several protein factors which catalyze disulphide bond formation have been characterized. Protein disulphide isomerase (PDI) is an abundant, multifunctional protein found in the lumen of the endoplasmic reticulum (ER) that promotes proper formation of disulphide bonds in secretory and cell surface proteins (LaMantia et al., Proc. Natl. Acad. Sci. USA, 88: 4453-4457, 1991; Farquhar et al., Gene, 108: 81-89, 1991; Freedman, Cell, 57: 1069-1072, 1989;
Laboissixc3xa8re et al., J. Biol. Chem., 270: 28006-28009, 1995).
A similar function, but in a different cellular compartment, has been ascribed to another small, ubiquitous protein, thioredoxin (TRX) (Gan, J. Biol. Chem., 266: 1692-1696, 1991; Muller, J. Biol. Chem., 266: 9194-9202, 1991; Chivers et al., EMBO J., 15: 2659-2667, 1996), that has an active-site sequence similar to that of PDI. Thioredoxins are cytosolic polypeptides capable of catalyzing the reduction of disulphides using glutathione as a reductant (Holmgren, J. Biol. Chem., 264: 13963-13966, 1989). It has been postulated that thioredoxin may also be involved in the reduction of prematurely formed disulphides in proteins that have entered the ER. Since the biological activity of a number of key enzymes involved in crucial metabolic pathways depends on the cytosolic redox system, it is plausible that TRX plays a relevant role in the modification of proteins involved in folding in cellular compartments other than the cytosol.
In the numerous organisms, for example, bacteria, yeast, mammalian cells and insect cells, which have been genetically manipulated to (over)express heterologous proteins, the problem encountered with most expression systems is the inability to express proteins which are biologically active.
It now appears that due to the lack of or inefficient amount of the enzymes necessary for correct folding or assembly of heterologous proteins in non-native expression hosts, such expressed heterologous proteins are often not biologically active and/or have an incorrect protein structure.
The present invention overcomes this problem and allows for the expression of biologically active and/or correctly structured heterologous proteins in non-native expression hosts.
It has now been found that expression cassettes encoding PDI or TRX can be used to transform a host organism thereby making it capable of overexpressing PDI or TRX. Preferably, the host organism is yeast. Yeast cells, for example, overexpressing these proteins can subsequently or simultaneously be transformed with expression vectors encoding one or more desirable heterologous proteins. The heterologous proteins expressed in such PDI/TRX-transformed yeast cells are in a properly-folded, biologically active form due to the disulphide bond formation activity of the PDI or TRX enzymes co-expressed in the same cell.
Such systems for producing biologically active heterologous proteins can be advantageously used for the production of, for example, proteins for human or veterinary therapeutic and/or diagnostic use or other proteins of commercial or research interest. The correct and optimum biological activity effected by the methods of the present invention is paramount in producing, for example, effective drugs and diagnostic reagents.
PDI overexpression in Saccharomyces cerevisiae has been found to enhance the secretion of human platelet-derived growth factor B homodimer (PDGF-BB) into the culture medium (Robinson et al., Bio/Technology, 12: 381-384, 1994).
In the present invention an increased level of heterologous protein folding efficiency in cells has been demonstrated.
According to a first aspect of the invention there is provided a vector comprising an expression cassette comprising a DNA sequence encoding a protein capable of catalyzing disulphide bond formation.
Such a vector desirably results in (over)expression of the protein in a transformed host cell, thus providing the conditions for correct heterologous protein folding.
The protein may be any protein capable of catalyzing disulphide bond formation and is preferably protein disulphide isomerase (PDI) or thioredoxin (TRX) or a combination thereof. The genes encoding PDI and TRX are preferably obtained from yeast, more preferably from S. cerevisiae. However, the sources of PDI and TRX can also include, for example, human wild-type and mutant cDNA sequences.
As used herein, the term xe2x80x9cexpression cassettexe2x80x9d connotes an DNA sequence comprising at least a structural DNA sequence encoding a protein and appropriate expression and optionally control sequences to facilitate expression of the structural DNA sequence.
The vector may comprise more than one DNA sequence encoding a protein which is capable of catalyzing disulphide bond formation, in one or more expression cassettes. The DNA sequences may be repeats of the same DNA sequence or may encode different proteins capable of catalyzing disulphide bond formation. Providing multiple copies of DNA sequences of the same or different proteins capable of catalyzing disulphide bond formation represents one way of achieving the desirable overexpression of the proteins and thus achieving the advantageous and inventive technical effect.
The expression cassette may also comprise a DNA sequence encoding a leader peptide for secretion fused to the 5xe2x80x2 end of the gene coding for the protein capable of catalyzing disulphide bond formation. In so doing, the construct desirably results in (1) (over)expression of the protein in the transformed cell, and (2) localisation of the (over)expressed protein in the ER, and/or other secretory compartments, where it can exert its function, thus providing the conditions for correct heterologous protein folding.
The vector of the first aspect of the invention may also further comprise an expression cassette comprising DNA sequence(s) encoding one or more heterologous proteins. Preferably, the heterologous protein is hepatitis C virus (HCV) E2715 envelope glycoprotein or human c-fos-induced growth factor (FIGF).
The vector of the first aspect of the invention may be integrative or episomal when transformed into a host organism. Preferably the vector is capable of integration into the host organism.
According to a second aspect of the invention there is provided a host organism transformed with a vector according to the first aspect of the invention.
Again, multiple copies of the vector may be present either as episomal vectors or integrated in the host organism genome to assist (over)expression of the protein capable of catalyzing disulphide bond formation.
According to a third aspect of the invention there is provided a host organism of the second aspect of the invention further transformed with a vector comprising an expression cassette comprising DNA sequence(s) encoding one or more heterologous proteins. This further vector may be integrative or episomal when transformed into the host organism.
The host organism may be co-transfected with more than one vector which comprises an expression cassette comprising DNA sequence(s) encoding one or more heterologous proteins.
The host organism may be any host organism in which the expression of a heterologous protein is prone to incorrect disulphide bond formation.
The heterologous protein may be any protein not normally produced in the host organism and which would, in the absence of (over)expression of the protein capable of catalyzing disulphide bond formation, be produced in a form with incorrect disulphide bonds. Preferably, the heterologous protein is HCV E2715 envelope glycoprotein or human FIGF.
Preferably, the host organism according to the second and third aspects of the invention is yeast and more preferably S. cerevisiae. 
According to a fourth aspect of the invention there is provided a method of producing a host organism according to the second aspect of the invention, comprising transforming a host organism with a vector of the first aspect of the invention.
According to a fifth aspect of the invention there is provided a method of producing a host organism according to the third aspect of the invention, comprising further transforming a host organism of the second aspect of the invention either subsequently or simultaneously with a vector comprising an expression cassette comprising DNA sequence(s) encoding one or more heterologous proteins.
According to a sixth aspect of the invention there is provided a method for expressing biologically active and/or correctly structured heterologous protein(s) in a host organism, comprising the steps of:
(a) transforming a host organism with one or more vectors according to the first aspect of the present invention;
(b) further transforming the host organism of step (a) either subsequently or simultaneously with one or more vectors comprising DNA sequence(s) encoding one or more heterologous proteins; and
(c) culturing the host organism of step (b) in conditions suitable for expression of the one or more heterologous proteins.
According to a seventh aspect of the invention there is provided a method for expressing biologically active and/or correctly structured heterologous protein(s) in a host organism, comprising:
(a) transforming a host organism according to the second aspect of the invention either subsequently or simultaneously with one or more vectors comprising DNA sequence(s) encoding one or more heterologous proteins; and
(b) culturing the host organism of step (a) in conditions suitable for expression of the one or more heterologous proteins.
According to an eighth aspect of the invention there is provided a method for expressing biologically active and/or correctly structured heterologous protein(s) in a host organism, comprising the step of culturing a host organism transformed with one or more vectors according to the third aspect of the invention in conditions suitable for expression of the one or more heterologous proteins.
According to a ninth aspect of the invention there is provided a method for expressing HCV E2715 envelope glycoprotein or human FIGF in a host organism.
According to a tenth aspect of the invention there is provided a method for the preparation of an immunogenic composition, comprising bringing HCV-E2715 envelope glycoprotein or human FIGF produced by the method according to the ninth aspect of the invention into association with a pharmaceutically carrier and optionally an adjuvant.
The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are within the skill of the art. Such techniques are explained fully in the literature (see, for example, Sambrook et al., Molecular Cloning: A Laboratory Manual, Second Edition, 1989; D. N Glover (ed.), DNA Cloning, Volumes I and II, 1985; M. J. Gait (ed.), Oligonucleotide Synthesis, 1984; B. D. Hames and S. J. Higgins (eds.), Nucleic Acid Hybridization, 1984; B. D. Hames and S. J. Higgins (eds.), Transcription and Translation, 1984; R. I. Freshney (ed.), Animal Cell Culture, 1986; Immobilized Cells and Enzymes, IRL Press, 1986; B. Perbal, A Practical Guide to Molecular Cloning, 1984; The series, Methods in Enzymology, Academic Press, Inc.; J. H. Miller and M. P. Calos (eds.), Gene Transfer Vectors for Mammalian Cells, Cold Spring Harbor Laboratory, 1987; Wu and Grossman (eds.) and Wu (ed.), Methods in Enzymology, Volumes 154 and 155, respectively; Mayer and Walker (eds.), Immunochemical Methods in Cell and Molecular Biology, Academic Press, London, 1987; Scopes, Protein Purification: Principles and Practice, Second Edition, Springer-Verlag, New York, 1987; and D. M. Weir and C. C. Blackwell (eds.), Handbook of Experimental Immunology, Volumes I-IV, 1986).
As mentioned above, examples of the protein capable of catalyzing disulphide bond formation that can be used in the present invention include polypeptides with minor amino acid variations from the amino acid sequence of the PDI or TRX protein specifically described.
A significant advantage of producing heterologous proteins by recombinant DNA techniques rather than by isolating and purifying a protein from natural sources is that equivalent quantities of the protein can be produced by using less starting material than would be required for isolating the protein from a natural source. Producing the protein by recombinant techniques also permits the protein to be isolated in the absence of some molecules normally present in cells. Indeed, protein compositions entirely free of any trace of human protein contaminants can readily be produced because the only human protein produced by the recombinant non-human host is the recombinant protein at issue. Potential viral agents from natural sources and viral components pathogenic to humans are also avoided.
Pharmaceutically acceptable carriers include any carrier that does not itself induce the production of antibodies harmful to the individual receiving the composition. Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, lipid aggregates (such as oil droplets or liposomes) and inactive virus particles. Such carriers are well known to those of ordinary skill in the art. Additionally, these carriers may function as immunostimulating agents (adjuvants).
Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: aluminum salts (alum) such as aluminium hydroxide, aluminium phosphate, aluminium sulphate etc., oil emulsion formulations, with or without other specific immunostimulating agents such as muramyl peptides or bacterial cell wall components, such as for example (1) MF59 (Published International patent application WO-A-90/14837, containing 5% Squalene, 0.5% Tween(copyright) 80, 0.5% Span(copyright) 85 (optionally containing various amounts of MTP-PE (see below), although not required) formulated into submicron particles using a microfluidizer such as Model 110Y microf luidizer (Microfluidics, Newton, Mass. 02164, USA), (2) SAF, containing 10% squalene, 0.4% Tween 80, 5% pluronic-blocked polymer L121, and thr-MDP (see below) either microf luidized into a submicron emulsion or vortexed to generate a larger particle size emulsion, and (3) RIBI(trademark) adjuvant system (RAS) (Ribi Immunochem, Hamilton, Mont., USA) containing 2% Squalene, 0.2% Tween(copyright) 80 and one or more bacterial cell wall components from the group consisting of monophosphoryl lipid A (MPL), trehalose dimycolate (TDM), and cell wall skeleton (CWS) preferably MPL+CWS (Detox(trademark)), muramyl peptides such as N-acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), N-acetyl-muramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(1xe2x80x2-2xe2x80x2-dipalmitoyl-sn-glycero-3-hydroxyphosphoryloxy)-ethylamine (MTP-PE) etc., and cytokines, such as interleukins (IL-1, IL-2 etc.), macrophage colony stimulating factor (M-CSF), tumour necrosis factor (TNF) etc. Additionally, saponin adjuvants, such as Stimulon(trademark) (Cambridge Bioscience, Worcester, Mass., USA) may be used or particles generated therefrom such as ISCOMS (immunostimulating complexes). Furthermore, complete Freunds Adjuvant (CFA) and Incomplete Freunds Adjuvant (IPA) may be used. Alum and MF59 are preferred.
The immunogenic compositions (e.g. the antigen, pharmaceutically acceptable carrier and adjuvant) typically will contain diluents, such as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or emulsifying agents, pH buffering substances, and the like, may be present in such vehicles.
Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for enhanced adjuvant effect as discussed above under pharmaceutically acceptable carriers.
Immunogenic compositions used as vaccines comprise an immunologically effective amount of the antigenic polypeptides, as well as any other of the above-mentioned components, as needed. By xe2x80x9cimmunologically effective amountxe2x80x9d, it is meant that the administration of that amount to an individual, either in a single dose or as part of a series, is effective for treatment or prevention. This amount varies depending upon the health and physical condition of the individual to be treated, the taxonomic group of individual to be treated (e.g., nonhuman primate, primate, etc.), the capacity of the individual""s immune system to synthesize antibodies, the degree of protection desired, the formulation of the vaccine, the treating doctor""s assessment of the medical situation, and other relevant factors. It is expected that the amount will fall in a relatively broad range that can be determined through routine trials.
The immunogenic compositions are conventionally administered parenterally, e.g. by injection either subcutaneously or intramuscularly. Additional formulations suitable for other modes of administration include oral and pulmonary formulations, suppositories and transdermal applications. Dosage treatment may be a single dose schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other immunoregulatory agents.
The term xe2x80x9crecombinant polynucleotidexe2x80x9d as used herein intends a polynucleotide of genomic, cDNA, semisynthetic, or synthetic origin which, by virtue of its origin or manipulation: (1) is not associated with all or a portion of a polynucleotide with which it is associated in nature, (2) is linked to a polynucleotide other than that to which it is linked in nature, or (3) does not occur in nature.
The term xe2x80x9cpolynucleotidexe2x80x9d as used herein refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides. This term refers only to the primary structure of the molecule. Thus, this term includes double- and single-stranded DNA and RNA. It also includes known types of modifications, for example, labels which are known in the art, methylation, xe2x80x9ccapsxe2x80x9d, substitution of one or more of the naturally occurring nucleotides with an analogue, internucleotide modifications such as, for example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, phosphoamidates, carbamates, etc.) and with charged linkages (e.g., phosphorothioates, phosphorodithioates, etc.), those containing pendant moieties, such as, for example proteins (including, for example, nucleases, toxins, antibodies, signal peptides, poly-L-lysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those containing chelators (e.g., metals, radioactive metals, boron, oxidative metals, etc.), those containing alkylators, those with modified linkages (e.g., alpha anomeric nucleic acids, etc.), as well as unmodified forms of the polynucleotide.
A xe2x80x9crepliconxe2x80x9d is any genetic element, e.g., a plasmid, a chromosome, a virus, a cosmid, etc. that behaves as an autonomous unit of polynucleotide replication within a cell; i.e., capable of replication under its own control. This may include selectable markers.
A xe2x80x9cvectorxe2x80x9d is a replicon in which another polynucleotide segment is attached, so as to bring about the replication and/or expression of the attached segment.
xe2x80x9cControl sequencexe2x80x9d refers to polynucleotide sequences which are necessary to effect the expression of coding sequences to which they are ligated. The nature of such control sequences differs depending upon the host organism; in prokaryotes, such control sequences generally include promoter, ribosomal binding site, and transcription termination sequence; in eukaryotes, generally, such control sequences include promoters and transcription termination sequence. The term xe2x80x9ccontrol sequencesxe2x80x9d is intended to include, at a minimum, all components whose presence is necessary for expression, and may also include additional components whose presence is advantageous, for example, leader sequences and fusion partner sequences.
xe2x80x9cOperably linkedxe2x80x9d refers to a juxtaposition wherein the components so described are in a relationship permitting them to function in their intended manner. A control sequence xe2x80x9coperably linkedxe2x80x9d to a coding sequence is ligated in such a way that expression of the coding sequence is achieved under conditions compatible with the control sequences.
An xe2x80x9copen reading framexe2x80x9d (ORF) is a region of a polynucleotide sequence which encodes a polypeptide; this region may represent a portion of a coding sequence or a total coding sequence.
A xe2x80x9ccoding sequencexe2x80x9d is a polynucleotide sequence which is translated into a polypeptide, usually via MRNA, when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a translation start codon at the 5xe2x80x2-terminus and a translation stop codon at the 3xe2x80x2-terminus. A coding sequence can include, but is not limited to, cDNA, and recombinant polynucleotide sequences.
xe2x80x9cPCRxe2x80x9d refers to the technique of polymerase chain reaction as described in Saiki et al., Nature, 324: 163, 1986; Scharf et al., Science, 233: 1076-1078, 1986; U.S. Pat. No. 4,683,195; and U.S. Pat. No. 4,683,202.
As used herein, x is xe2x80x9cheterologousxe2x80x9d with respect to y if x is not naturally associated with y in the identical manner; i.e., x is not associated with y in nature or x is not associated with y in the same manner as is found in nature.
xe2x80x9cHomologyxe2x80x9d refers to the degree of similarity between x and y. The correspondence between the sequence from one form to another can be determined by techniques known in the art. For example, they can be determined by a direct comparison of the sequence information of the polynucleotide. Alternatively, homology can be determined by hybridization of the polynucleotides under conditions which form stable duplexes between homologous regions (for example, those which would be used prior to S1 digestion), followed by digestion with single-stranded specific nuclease(s), followed by size determination of the digested fragments.
As used herein, the term xe2x80x9cpolypeptidexe2x80x9d refers to a polymer of amino acids and does not refer to a specific length of the product; thus, peptides, oligopeptides, and proteins are included within the definition of polypeptide. This term also does not refer to or exclude post expression modifications of the polypeptide, for example, glycosylations, acetylations, phosphorylations and the like. Included within the definition are, for example, polypeptides containing one or more analogues of an amino acid (including, for example, unnatural amino acids, etc.), polypeptides with substituted linkages, as well as other modifications known in the art, both naturally occurring and non-naturally occurring.
A polypeptide or amino acid sequence xe2x80x9cderived fromxe2x80x9d a designated nucleic acid sequence refers to a polypeptide having an amino acid sequence identical to that of a polypeptide encoded in the sequence, or a portion thereof wherein the portion consists of at least 3-5 amino acids, and more preferably at least 8-10 amino acids, and even more preferably at least 11-15 amino acids, or which is immunologically identifiable with a polypeptide encoded in the sequence. This terminology also includes a polypeptide expressed from a designated nucleic acid sequence.
The protein may be used for producing antibodies, either monoclonal or polyclonal, specific to the protein. The methods for producing these antibodies are known in the art.
xe2x80x9cRecombinant host cellsxe2x80x9d, xe2x80x9chost cellsxe2x80x9d, xe2x80x9ccells,xe2x80x9d xe2x80x9ccell culturesxe2x80x9d, and other such terms denote, for example, microorganisms, insect cells, and mammalian cells, that can be, or have been, used as recipients for recombinant vector or other transfer DNA, and include the progeny of the original cell which has been transformed. It is understood that the progeny of a single parental cell may not necessarily be completely identical in morphology or in genomic or total DNA complement as the original parent, due to natural, accidental, or deliberate mutation. Examples for mammalian host cells include Chinese hamster ovary (CHO) and monkey kidney (COS) cells.
Specifically, as used herein, xe2x80x9ccell linexe2x80x9d refers to a population of cells capable of continuous or prolonged growth and division in vitro. Often, cell lines are clonal populations derived from a single progenitor cell. It is further known in the art that spontaneous or induced changes can occur in karyotype during storage or transfer of such clonal populations. Therefore, cells derived from the cell line referred to may not be precisely identical to the ancestral cells or cultures, and the cell line referred to includes such variants. The term xe2x80x9ccell linexe2x80x9d also includes immortalized cells. Preferably, cell lines include nonhybrid cell lines or hybridomas to only two cell types.
As used herein, the term xe2x80x9cmicroorganismxe2x80x9d includes prokaryotic and eukaryotic microbial species such as bacteria and fungi, the latter including yeast and filamentous fungi.
xe2x80x9cTransformationxe2x80x9d, as used herein, refers to the insertion of an exogenous polynucleotide into a host cell, irrespective of the method used for the insertion, for example, direct uptake, transduction, f-mating or electroporation. The exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host genome.
By xe2x80x9cgenomicxe2x80x9d is meant a collection or library of DNA molecules which are derived from restriction fragments that have been cloned in vectors. This may include all or part of the genetic material of an organism.
By xe2x80x9ccDNAxe2x80x9d is meant a complementary DNA sequence that hybridizes to a complementary strand of DNA.
By xe2x80x9cpurifiedxe2x80x9d and xe2x80x9cisolatedxe2x80x9d is meant, when referring to a polypeptide or nucleotide sequence, that the indicated molecule is present in the substantial absence of other biological macromolecules of the same type. The term xe2x80x9cpurifiedxe2x80x9d as used herein preferably means at least 75% by weight, more preferably at least 85% by weight, more preferably still at least 95% by weight, and most preferably at least 98% by weight, of biological macromolecules of the same type present (but water, buffers, and other small molecules, especially molecules having a molecular weight of less than 1000, can be present).
Once the appropriate coding sequence is isolated, it can be expressed in a variety of different expression systems; for example those used with mammalian cells, baculoviruses, bacteria, and yeast.
i. Mammalian Systems
Mammalian expression systems are known in the art. A mammalian promoter is any DNA sequence capable of binding mammalian RNA polymerase and initiating the downstream (3xe2x80x2) transcription of a coding sequence (e.g. structural gene) into mRNA. A promoter will have a transcription initiating region, which is usually placed proximal to the 5xe2x80x2 end of the coding sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at the correct site. A mammalian promoter will also contain an upstream promoter element, usually located within 100 to 200 bp upstream of the TATA box. An upstream promoter element determines the rate at which transcription is initiated and can act in either orientation (Sambrook et al., xe2x80x9cExpression of Cloned Genes in Mammalian Cellsxe2x80x9d, in: Molecular Cloning: A Laboratory Manual, 2nd ed., 1989).
Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences encoding mammalian viral genes provide particularly useful promoter sequences. Examples include the SV40 early promoter, mouse mammary tumour virus LTR promoter, adenovirus major late promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non-viral genes, such as the murine metallothioneih gene, also provide useful promoter sequences. Expression may be either constitutive or regulated (inducible), depending on the promoter can be induced with glucocorticoid in hormone-responsive cells.
The presence of an enhancer element (enhancer), combined with the promoter elements described above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, with synthesis beginning at the normal RNA start site. Enhancers are also active when they are placed upstream or downstream from the transcription initiation site, in either normal or flipped orientation, or at a distance of more than 1000 nucleotides from the promoter (Maniatis et al., Science, 236: 1237, 1987; Alberts et al., Molecular Biology of the Cell, 2nd ed., 1989). Enhancer elements derived from viruses may be particularly useful, because they usually have a broader host range. Examples include the SV40 early gene enhancer [Dijkema et al., EMBO J., 4: 761, 1985) and the enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus (Gorman et al., Proc. Natl. Acad. Sci. USA, 79: 6777, 1982b) and from human cytomegalovirus (Boshart et al., Cell, 41: 521, 1985). Additionally, some enhancers are regulatable and become active only in the presence of an inducer, such as a hormone or metal ion (Sassone-Corsi and Borelli, Trends Genet., 2: 215, 1986; Maniatis et al., Science, 236: 1237, 1987).
A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide.
Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus triparite leader is an example of a leader sequence that provides for secretion of a foreign protein in mammalian cells.
Usually, transcription termination and polyadenylation sequences recognized by mammalian cells are regulatory regions located 3xe2x80x2 to the translation stop codon and thus, together with the promoter elements, flank the coding sequence. The 3xe2x80x2 terminus of the mature mRNA is formed by site-specific post-transcriptional cleavage and polyadenylation (Birnstiel et al., Cell, 41: 349, 1985; Proudfoot and Whitelaw, xe2x80x9cTermination and 3xe2x80x2 end processing of eukaryotic RNAxe2x80x9d, in: Transcription and Splicing (eds. B. D. Hames and D. M. Glover), 1988; Proudfoot, Trends Biochem. Sci., 14: 105, 1989). These sequences direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. Examples of transcription terminator/polyadenylation signals include those derived from SV40 (Sambrook et al ., xe2x80x9cExpression of cloned genes in cultured mammalian cellsxe2x80x9d, in: Molecular Cloning: A Laboratory Manual, 1989).
Some genes may be expressed more efficiently when introns (also called intervening sequences) are present. Several cDNAs, however, have been efficiently expressed from vectors that lack splicing signals (also called splice donor and acceptor sites) (see, for example, Gothing and Sambrook, Nature, 293: 620, 1981). Introns are intervening noncoding sequences within a coding sequence that contain splice donor and acceptor sites. They are removed by a process called xe2x80x9csplicing,xe2x80x9d following polyadenylation of the primary transcript (Nevins, Ann. Rev. Biochem., 52: 441, 1983; Green, Ann. Rev. Genet., 20: 671, 1986; Padgett et al., Ann. Rev. Biochem. 55: 1119, 1986; Krainer and Maniatis, xe2x80x9cRNA splicingxe2x80x9d, in: Transcription and Splicing (eds. B. D. Hames and D. M. Glover), 1988).
Usually, the above-described components, comprising a promoter, polyadenylation signal, and transcription termination sequence are put together into expression constructs. Enhancers, introns with functional splice donor and acceptor sites, and leader sequences may also be included in an expression construct, if desired. Expression constructs are often maintained in a replicon, such as an extrachromosomal element (e.g., plasmids) capable of stable maintenance in a host, such as mammalian cells or bacteria. Mammalian replication systems include those derived from animal viruses, which require trans-acting factors to replicate. For example, plasmids containing the replication systems of papovaviruses, such as SV40 (Gluzman, Cell, 23: 175, 1981) or polyomaviruses, replicate to extremely high copy number in the presence of the appropriate viral T antigen. Additional examples of mammalian replicons include those derived from bovine papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replication systems, thus allowing it to be maintained, for example, in mammalian cells for expression and in a prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle vectors include pMT2 (Kaufman et al., Mol. Cell. Biol., 9: 946, 1989) and pHEBO (Shimizu et al., Mol. Cell. Biol., 6: 1074, 1986).
The transformation procedure used depends upon the host to be transformed. Methods for introduction of heterologous polynucleotides into mammalian cells are known in the art and include dextran-mediated transfection, calcium phosphate precipitation, polybrene-mediated transfection, protoplast fusion, electroporation, encapsulation of -the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei.
Mammalian cell lines available as hosts for expression are known in the art and include many immortalized cell lines available from the American Type Culture Collection (ATCC), including but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) cells, monkey kidney cells (COS), human hepatocellular carcinoma cells (e.g., Hep G2), and a number of other cell lines.
ii. Baculovirus Systems
The polynucleotide encoding the protein can also be inserted into a suitable insect expression vector, and is operably linked to the control elements within that vector. Vector construction employs techniques which are known in the art.
Generally, the components of the expression system include a transfer vector, usually a bacterial plasmid, which contains both a fragment of the baculovirus genome, and a convenient restriction site for insertion of the heterologous gene or genes to be expressed; a wild type baculovirus with a sequence homologous to the baculovirus-specific fragment in the transfer vector (this allows for the homologous recombination of the heterologous gene into the baculovirus genome); and appropriate insect host cells and growth media.
After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the wild-type viral genome are transfected into an insect host cell where the vector and viral genome are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques are identified and purified. Materials and methods for baculovirus/insect cell expression systems -are commercially available in kit form from, inter alia, Invitrogen, San Diego, Calif., USA (xe2x80x9cMaxBacxe2x80x9d kit). These techniques are generally known to those skilled in the art and fully described in Summers and Smith, Texas Agricultural Experiment Station Bulletin No. 1555, 1987 (hereinafter xe2x80x9cSummers and Smithxe2x80x9d).
Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above-described components, comprising a promoter, leader (if desired), coding sequence of interest, and transcription termination sequence, are usually assembled into an intermediate transplacement construct (transfer vector). This construct may contain a single gene and operably linked regulatory elements; multiple genes, each with its owned set of operably linked regulatory elements; or multiple genes, regulated by the same set of regulatory elements. Intermediate transplacement constructs are often maintained in a replicon, such as an extrachromosomal element (e.g., plasmids) capable of stable maintenance in a host, such as a bacterium. The replicon will have a replication system, thus allowing it to be maintained in a suitable host for cloning and amplification.
Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is pAc373. Many other vectors, known to those of skill in the art, have also been designed. These include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and Summers, Virology, 17: 31, 1989).
The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al., Ann. Rev. Microbiol., 42: 177, 1988) and a prokaryotic ampicillin-resistance (amp) gene and origin of replication for selection and propagation in Escherichia coli. 
Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream (5xe2x80x2 to 3xe2x80x2) transcription of a coding sequence (e.g. structural gene) into mRNA. A promoter will have a transcription initiation region which is usually placed proximal to the 5xe2x80x2 end of the coding sequence. This transcription initiation region usually includes an RNA polymerase binding site and a transcription initiation site. A baculovirus transfer vector may also have a second domain called an enhancer, which, if present, is usually distal to the structural gene. Expression may be either regulated or constitutive.
Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly useful promoter sequences. Examples include sequences derived from the gene encoding the viral polyhedron protein (Friesen et al., xe2x80x9cThe Regulation of Baculovirus Gene Expressionxe2x80x9d, in: The Molecular Biology of Baculoviruses (ed. Walter Doerfler), 1986; and EPO Publ. Nos. 127 839 and 155 476) and the gene encoding the p10 protein (Vlak et al., J. Gen. Virol., 69: 765, 1988).
DNA encoding suitable signal sequences can be derived from genes for secreted insect or baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al., Gene, 73: 409, 1988). Alternatively, since the signals for mammalian cell posttranslational modifications (such as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by insect cells, and the signals required for secretion and nuclear accumulation also appear to be conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as those derived from genes encoding human xcex1-interferon (Maeda et al., Nature, 315: 592, 1985), human gastrin-releasing peptide (Lebacq-Verheyden et al., Mol. Cell. Biol., 8: 3129, 1988), human IL-2, Smith et al., Proc. Natl. Acad. Sci. USA, 82: 8404, 1985), mouse IL-3 (Miyajima et al., Gene, 58: 273, 1987) and human glucocerebrosidase (Martin et al., DNA, 7: 99, 1988), can also be used to provide for secretion in insects.
A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused foreign proteins usually requires heterologous genes that ideally have a short leader sequence containing suitable translation initiation signals preceding an ATG start signal. If desired, methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with cyanogen bromide.
Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted from the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provides for secretion of the foreign protein in insects. The leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the translocation of the protein into the endoplasmic reticulum.
After insertion of the DNA sequence and/or the gene encoding the expression product precursor of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer vector and the genomic DNA of wild-type baculovirusxe2x80x94usually by co-transfection. The promoter and transcription termination sequence of the construct will usually comprise a 2-5 kb section of the baculovirus genome. Methods for introducing heterologous DNA into the desired site in the baculovirus virus are known in the art (see Summers and Smith, supra; Smith et al., Mol. Cell. Biol., 3: 2156, 1983; and Luckow and Summers, supra). For example, the insertion can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene (Miller et al., Bioessays, 4: 91, 1989). The DNA sequence, when cloned in place of the polyhedrin gene in the expression vector, is flanked both 5xe2x80x2 and 3xe2x80x2 by polyhedrin-specific sequences and is positioned downstream of the polyhedrin promoter.
The newly formed baculovirus expression vector is subsequently packaged into an infectious recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1% and about 5%); thus, the majority of the virus produced after co-transfection is still wild-type virus. Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, which is produced by the native virus, is produced at very high levels in the nuclei of infected cells at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also contain embedded particles. These occlusion bodies, up to 15 xcexcm in size, are highly refractile, giving them a bright shiny appearance that is readily visualized under the light microscope. Cells infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by techniques known to those skilled in the art. Namely, the plaques are screened under the light microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant virus) of occlusion bodies (Ansubel et al. (eds.), xe2x80x9cCurrent Protocols in Microbiologyxe2x80x9d, Vol. 2 at 16.8 (Supp. 10), 1990; Summers and Smith, supra; Miller et al., supra).
Recombinant baculovirus expression vectors have been developed for infection into several insect cells. For example, recombinant baculoviruses have been developed for, inter alia, Aedes aegypti, Autographa californica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and Trichoplusia ni (PCT Pub. No. WO 89/046699; Carbonell et al., J. Virol., 56: 153, 1985; Wright, Nature, 321: 718; 1986; Smith et al., Mol. Cell. Biol., 3: 2156, 1983; and see generally, Fraser, et al., In Vitro Cell. Dev. Biol., 25: 225, 1989).
Cells and cell culture media are commercially available for both direct and fusion expression of heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally known to those skilled in the art (see, e.g., Summers and Smith, supra).
The modified insect cells may then be grown in an appropriate nutrient medium, which allows for stable maintenance of the plasmid(s) present in the modified insect host. Where the expression product gene is under inducible control, the host may be grown to high density, and expression induced. Alternatively, where expression is constitutive, the product will be continuously expressed into the medium and the nutrient medium must be continuously circulated, while removing the product of interest and augmenting depleted nutrients. The product may be purified by such techniques as chromatography, e.g., HPLC, affinity chromatography, ion exchange chromatography, etc.; electrophoresis; density gradient centrifugation; solvent extraction, or the like. As appropriate, the product may be further purified, as required, so as to remove substantially any insect proteins which are also secreted in the medium or result from lysis of insect cells, so as to provide a product which is at least substantially free of host debris, e.g., proteins, lipids and polysaccharides.
In order to obtain protein expression, recombinant host cells derived from the transformants are incubated under conditions which allow expression of the recombinant protein encoding sequence. These conditions will vary, dependent upon the host cell selected. However, the conditions are readily ascertainable to those of ordinary skill in the art, based upon what is known in the art.
iii. Bacterial Systems
Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence capable of binding bacterial RNA polymerase and initiating the downstream (3xe2x80x3) transcription of a coding sequence (e.g. structural gene) into mRNA. A promoter will have a transcription initiation region which is usually placed proximal to the 5xe2x80x2 end of the coding sequence. This transcription initiation region usually includes an RNA polymerase binding site and a transcription initiation site. A bacterial promoter may also have a second domain called an operator, that may overlap an adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence of negative regulatory elements, such as the operator. In addition, positive regulation may be achieved by a gene activator protein binding sequence, which, if present is usually proximal (5xe2x80x2) to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite activator protein (CAP), which helps initiate transcription of the lac operon in E. coli (Raibaud et al., Ann. Rev. Genet., 18: 173, 1984). Regulated expression may therefore be either positive or negative, thereby either enhancing or reducing transcription.
Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, lactose (lac) (Chang et al., Nature, 198: 1056, 1977), and maltose. Additional examples include promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) (Goeddel et al., Nuc. Acids Res., 8: 4057, 1980; Yelverton et al., Nuc. Acids Res., 9: 731, 1981; U.S. Pat. No. 4,738,921; and EPO Publ. Nos. 036 776 and 121 775). The g-laotamase (bla) promoter system (Weissmann, xe2x80x9cThe cloning of interferon and other mistakesxe2x80x9d, in: Interferon 3 (ed. I. Gresser), 1981), and bacteriophage lambda PL (Shimatake et al., Nature, 292: 128, 1981) and T5 (U.S. Pat. No. 4,689,406) promoter systems also provide useful promoter sequences.
In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. For example, transcription activation sequences of one bacterial or bacteriophage promoter may be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a synthetic hybrid promoter (U.S. Pat. No. 4,551,433). For example, the tac promoter is a hybrid trp-lac promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac repressor (Amann et al., Gene, 25: 167, 1983; de Boer et al., Proc. Natl. Acad. Sci. USA, 80: 21, 1983). Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA polymerase/promoter system is an example of a coupled promoter system (Studier et al., J. Mol. Biol., 189: 113, 1986; Tabor et al., Proc. Natl. Acad. Sci. USA, 82: 1074, 1985). In addition, a hybrid promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO Publ. No. 267 851).
In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for the expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence. 3-9 nucleotides in length located 3-11 nucleotides upstream of the initiation codon (Shine et al., Nature, 254: 34, 1975). The SD sequence is thought to promote binding of mRNA to the ribosome by the pairing of bases between the SD sequence and the 3xe2x80x2 and of E. coli 16S rRNA (Steitz et al., xe2x80x9cGenetic signals and nucleotide sequences in messenger RNAxe2x80x9d, in: Biological Regulation and Development: Gene Expression (ed. R. F. Goldberger), 1979). To express eukaryotic genes and prokaryotic genes with weak ribosome-binding site (Sambrook et al., xe2x80x9cExpression of cloned genes in Escherichia colixe2x80x9d, in: Molecular Cloning: A Laboratory Manual, 1989).
A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked with the DNA molecule, in which case the first amino acid at the N-terminus will always be a methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO Publ. No. 219 237).
Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5xe2x80x2 end of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5xe2x80x2 terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene (Nagai et al., Nature, 309: 810, 1984). Fusion proteins can also be made with sequences from the lacZ (Jia et al., Gene, 60: 197, 1987), trpE (Allen et al., J. Biotechnol., 5: 93, 1987; Makoff et al., J. Gen. Microbiol., 135: 11, 1989), and Chey (EPO Publ. No. 324 647) genes. The DNA sequence at the junction of the two amino acid sequences may or may not encode a cleavable site. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably retains a site for a processing enzyme (e.g. ubiquitin specific processing-protease) to cleave the ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated (Miller et al., Bio/Technology 7: 698, 1989).
Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules that encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion of the foreign protein in bacteria (U.S. Pat. No. 4,336,336). The signal sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria). Preferably there are processing sites, which can be cleaved either in vivo or in vitro encoded between the signal peptide fragment and the foreign gene.
DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, such as the E. coli outer membrane protein gene (ompA) (Masui et al., in: Experimental Manipulation of Gene Expression, 1983; Ghrayeb et al., EMBO J., 3: 2437, 1984) and the E. coli alkaline phosphatase signal sequence (phoA) (Oka et al., Proc. Natl. Acad. Sci. USA, 82: 7212, 1985). As an additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains can be used to secrete heterologous proteins from B. subtilis (Palva et al., Proc. Natl. Acad. Sci. USA, 79: 5582, 1982; EPO Publ. No. 244 042).
Usually, transcription termination sequences recognized by bacteria are regulatory regions located 3xe2x80x2 to the translation stop codon, and thus together with the promoter flank the coding sequence. These sequences direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. Transcription termination sequences frequently include DNA sequences of about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. Examples include transcription termination sequences derived from genes with strong promoters, such as the trp gene in E. coli as well as other biosynthetic genes.
Usually, the above described components, comprising a promoter, signal sequence (if desired), coding sequence of interest, and transcription termination sequence, are put together into expression constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal element (e.g., plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will have a replication system, thus allowing it to be maintained in a prokaryotic host either for expression or for cloning and amplification. In addition, a replicon may be either a high or low copy number plasmid. A high copy number plasmid will generally have a copy number ranging from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy number plasmid will preferably contain at least about 10, and more preferably at least about 20 plasmids. Either a high or low copy number vector may be selected, depending upon the effect of the vector and the foreign protein on the host.
Alternatively, the expression constructs can be integrated into the bacterial genome with an integrating vector. Integrating vectors usually contain at least one sequence homologous to the bacterial chromosome that allows the vector to integrate. Integrations appear to result from recombinations between homologous DNA in the vector and the bacterial chromosome. For example, integrating vectors constructed with DNA from various Bacillus strains integrate into the Bacillus chromosome (EPO Publ. No. 127 328). Integrating vectors may also be comprised of bacteriophage or transposon sequences.
Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow for the selection of bacterial strains that have been transformed. Selectable markers can be expressed in the bacterial host and may include genes which render bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline (Davies et al., Ann. Rev. Microbiol., 32: 469, 1978). Selectable markers may also include biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways.
Alternatively, some of the above described components can be put together in transformation vectors. Transformation vectors are usually comprised of a selectable marker that is either maintained in a replicon or developed into an integrating vector, as described above.
Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, have been developed for transformation into many bacteria. For example, expression vectors have been developed for, inter alia, the following bacteria: B. subtilis (Palva et al., Proc. Natl. Acad. Sci. USA, 79: 5582, 1982; EPO Publ. Nos. 036 259 and 063 953; PCT Publ. No. WO 84/04541), E. coli (Shimatake et al., Nature, 292: 128, 1981; Amann et al., Gene, 40: 183, 1985; Studier et al., J. Mol. Biol., 189: 113, 1986; EPO Publ. Nos. 036 776, 136 829 and 136 907), Streptococcus cremoris (Powell et al., Appl. Environ. Microbiol., 54: 655, 1988); Streptococcus lividans (Powell et al., Appl. Environ. Microbiol., 54: 655, 1988), and Streptomyces lividans (U.S. Pat. No. 4,745,056).
Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually include either the transformation of bacteria treated with CaCl2 or other agents, such as divalent cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. Transformation procedures usually vary with the bacterial species to be transformed (see, e.g., Masson et al., FEMS Microbiol. Lett., 60: 273, 1989; Palva et al., Proc. Natl. Acad. Sci. USA, 79: 5582, 1982; EPO Publ. Nos. 036 259 and 063 953; PCT Publ. No. WO 84/04541 [Bacillus], Miller et al., Proc. Natl. Acad. Sci. USA, 8: 856, 1988; Wang et al., J. Bacteriol., 172: 949, 1990 (Campylobacter], Cohen et al., Proc. Natl. Acad. Sci. USA, 69: 2110, 1973; Dower et al., Nuc. Acids Res., 16: 6127, 1988; Kushner, xe2x80x9cAn improved method for transformation of Escherichia coli with ColE1-derived plasmidsxe2x80x9d, in: Genetic Engineering: Proceedings of the International Symposium on Genetic Engineering (eds. H. W. Boyer and S. Nicosia), 1978; Mandel et al., J. Mol. Biol., 53: 159, 1970; Taketo, Biochim. Biophys. Acta, 949: 318, 1988 [Escherichia], Chassy et al., FEMS Microbiol. Lett., 44: 173, 1987 [Lactobacillus], Fiedler et al., Anal. Biochem, 170: 38, 1988 [Pseudomonas], Augustin et al., FEMS Microbiol. Lett., 66: 203, 1990 [Staphylococcus), Barany et al., J. Bacteriol., 144: 698, 1980; Harlander, xe2x80x9cTransformation of Streptococcus lactis by electroporationxe2x80x9d, in: Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III), 1987; Perry et al., Infec. Immun., 32: 1295, 1981; Powell et al., Appl. Environ. Microbiol. 54: 655, 1988; Somkuti et al., Proc. 4th Eur. Cong. Biotechnology, 1: 412, 1987 [Streptococcus]).
iv. Yeast Expression
Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3xe2x80x2) transcription of a coding sequence (e.g. structural gene) into mRNA. A promoter will have a transcription initiation region which is usually placed proximal to the 5xe2x80x2 end of the coding sequence. This transcription initiation region usually includes an RNA polymerase binding site (the xe2x80x9cTATA Boxxe2x80x9d) and a transcription initiation site. A yeast promoter may also have a second domain called an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or reducing transcription.
Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples include alcohol dehydrogenase (ADH) (EPO Publ. No. 284 044), enolase, glucokinase, glucose-6-phosphate isomerase, glyceraldehyde-3-phosphatedehydrogenase (GAP or GAPDH), hexokinase, phosphofructokinase, 3-phosphoglyceratemutase, and pyruvate kinase (PyK) (EPO Publ. No. 329 203). The yeast PHO5 gene, encoding acid phosphatase, also provides useful promoter sequences (Myanohara et al., Proc. Natl. Acad. Sci. USA, 80: 1, 1983).
In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For example, UAS sequences of one yeast promoter may be joined with the transcription activation region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid promoters include the ADH regulatory sequence linked to the GAP transcription activation region (U.S. Pat. Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters which consist of the regulatory sequences of either the ADH2, GAL4, GAL10, or PHO5 genes, combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or PyK (EPO Publ. No. 164 556). Furthermore, a yeast promoter can include naturally occurring promoters of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. Examples of such promoters include, inter alia, Cohen et al., Proc. Natl. Acad. Sci. USA, 77: 1078, 1980; Henikoff et al., Nature, 283: 835, 1981; Hollenberg et al., Curr. Topics Microbiol. Immunol., 96: 119, 1981; Hollenberg et al., xe2x80x9cThe Expression of Bacterial Antibiotic Resistance Genes in the Yeast Saccharomyces cerevisiaexe2x80x9d, in: Plasmids of Medical, Environmental and Commercial Importance (eds. K. N. Timmis and A. Puhler), 1979; Mercerau-Puigalon et al., Gene, 11: 163, 1980; Panthier et al., Curr. Genet., 2: 109, 1980.
A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide.
Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, baculovirus, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal portion of an endogenous yeast protein, or other stable protein, is fused to the 5xe2x80x2 end of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be linked at the 5xe2x80x2 terminus of a foreign gene and expressed in yeast. The DNA sequence at the junction of the two amino acid sequences may or may not encode a cleavable site (see, e.g., EPO Publ. No. 196 056). Another example is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably retains a site for a processing enzyme (e.g. ubiquitin-specific processing protease) to cleave the ubiquitin from the foreign protein. Through this method, therefore, native foreign protein can be isolated (see, e.g., PCT Publ. No. WO 88/024066).
Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from the cell.
DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, such as the yeast invertase gene (EPO Publ. No. 012 873; JPO Publ. No. 62,096,086) and the A-factor gene (U.S. Pat. No. 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that also provide for secretion in yeast (EPO Publ. No. 060 057).
A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor gene, which contains both a xe2x80x9cprexe2x80x9d signal sequence, and a xe2x80x9cproxe2x80x9d region. The types of alpha-factor fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid residues) (U.S. Pat. Nos. 4,546,083 and 4,870,008; EPO Publ. No. 324 274). Additional leaders employing an alpha-factor leader fragment that provides for secretion include hybrid alpha-factor leaders made with a presequence of a first yeast, but a pro-region from a second yeast alpha-factor. (See, e.g., PCT Publ. No. WO 89/02463).
Usually, transcription termination sequences recognized by yeast are regulatory regions located 3xe2x80x2 to the translation stop codon, and thus together with the promoter flank the coding sequence. These sequences direct the transcription of an mRNA which can be translated into the polypeptide encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized termination sequences, such as those coding for glycolytic enzymes, are well known.
Usually, the above described components, comprising a promoter, leader (if desired), coding sequence of interest, and transcription termination sequence, are put together into expression constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal element (e.g., plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast-bacteria shuttle vectors include YEp24 (Botstein, et al., Gene, 8: 17-24, 1979), pCl/1 (Brake, et al., Proc. Natl. Acad. Sci. USA, 81: 4642-4646, 1984), and YRp17 (Stinchcomb, et al., J. Mol. Biol., 158: 157, 1982). In addition, a replicon may be either a high or low copy number plasmid. A high copy number plasmid will generally have a copy number ranging from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy number plasmid will preferably have at least about 10, and more preferably at least about 20. Either a high or low copy number vector may be selected, depending upon the effect of the vector and the foreign protein on the host (see, e.g., Brake et al., supra).
Alternatively, the expression constructs can be integrated into the yeast genome with an integrating vector. Integrating vectors usually contain at least one sequence homologous to a yeast chromosome that allows the vector to integrate, and preferably contain two homologous sequences flanking the expression construct. Integrations appear to result from recombinations between homologous DNA in the vector and the yeast chromosome (Orr-Weaver et al., Methods in Enzymol., 101: 228-245, 1983). An integrating vector may be directed to a specific locus in yeast by selecting the appropriate homologous sequence for inclusion in the vector (see Orr-Weaver et al., supra). One or more expression constructs may integrate, possibly affecting levels of recombinant protein produced (Rine et al., Proc. Natl. Acad. Sci. USA, 80: 6750, 1983). The chromosomal sequences included in the vector can occur either as a single segment in the vector, which results in the integration of the entire vector, or two segments homologous to adjacent segments in the chromosome and flanking the expression construct in the vector, which can result in the stable integration of only the expression construct.
Usually, extrachromosomal and integrating expression constructs may contain selectable markers to allow for the selection of yeast strains that have been transformed.
Selectable markers may include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2, TRP1, and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the presence of CUP1 allows yeast to grow in the presence of copper ions (Butt et al., Microbiol. Rev., 51: 351, 1987).
Alternatively, some of the above described components can be put together into transformation vectors. Transformation vectors are usually comprised of a selectable marker that is either maintained in a replicon or developed into an integrating vector, as described above.
Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, have been developed for transformation into many yeasts. For example, expression vectors have been developed for, inter alia, the following yeasts: Candida albicans (Kurtz et al., Mol. Cell. Biol., 6: 142, 1986), Candida maltose (Kunze et al., J. Basic Microbiol., 25: 141, 1985), Hansenula polymorpha (Gleeson et al., J. Gen. Microbiol., 132: 3459, 1986; Roggen{umlaut over (k)}amp et al., Mol. Gen. Genet., 202: 302, 1986), kluyveromyces fragilis (Das et al., J. Bacteriol., 158: 1165, 1984), Kluyveromyces lactis (De Louvencourt et al., J. Bacteriol., 154: 737, 1983; van den Berg et al., Bio/Technology, 8: 135, 1990), Pichia guillerimondii (Kunze et al., J. Basic Microbiol., 25: 141, 1985), Pichia pastoris (Cregg et al., Mol. Cell. Biol., 5: 3376, 1985; U.S. Pat. Nos. 4,837,148 and 4,929,555), Saccharomyces cerevisiae (Hinnen et al., Proc. Natl. Acad. Sci. USA, 75: 1929, 1978; Ito et al., J. Bacteriol., 153: 163, 1983), Schizosaccharomyces pombe (Beach and Nurse, Nature, 300: 706, 1981), and Yarrowia lipolytica (Davidow et al., Curr. Genet., 10: 39, 1985; Gaillardin et al., Curr. Genet., 10: 49, 1985).
Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. Transformation procedures usually vary with the yeast species to be transformed (see, e.g., Kurtz et al., Mol. Cell. Biol., 6: 142, 1986; Kunze et al., J. Basic Microbiol., 25: 141, 1985 [Candida], Gleeson et al., J. Gen. Microbiol., 132: 3459, 1986; Roggenkamp et al., Mol. Gen. Genet., 202: 302, 1986 [Hansenula], Das et al., J. Bacteriol., 158: 1165, 1984; De Louvencourt et al., J. Bacteriol., 754: 737, 1983; Van den Berg et al., Bio/Technology, 8: 135, 1990 [Kluyveromyces], Cregg et al., Mol. Cell. Biol., 5: 3376, 1985; Kunze et al., J. Basic Microbiol., 25: 141, 1985; U.S. Pat. Nos. 4,837,148 and 4,929,555 [Pichia], Hinnen et al., Proc. Natl. Acad. Sci. USA, 75: 1929, 1978; Ito et al., J. Bacteriol., 153: 163, 1983 [Saccharomyces], Beach and Nurse, Nature, 300: 706, 1981 [Schizosaccharomyces], and Davidow et al., Curr. Genet., 10: 39, 1985; Gaillardin et al., Curr. Genet., 10: 49, 1985 [Yarrowia]).