The methylotrophic yeast Pichia pastoris has been developed as a widely used host organism for recombinant protein production. The majority of the referred examples concern heterologous proteins that have been secreted to the growth medium of Pichia pastoris. In some cases exceptionally high yields have been obtained, such as human serum albumin and murine gelatins, whereas the secretion levels of many other proteins were significantly lower. e.g., J. M. Cregg., et al., Recombinant protein expression in Pichia pastoris, Mol. Biotechnol. 16 (2000) 23-52.
Potential Bottlenecks for Protein Secretion Include:
    1. Codon usage of the expressed gene, e.g., N. S. Outchkourov., et al., Optimization of the expression of equistatin in Pichia pastoris, Protein Expr. Purif. 24 (2002) 18-24;    2. Copy number of the gene, e.g., A. Vassileva., et al., Expression of hepatitis B surface antigen in the methylotrophic yeast Pichia pastoris using the GAP promoter, J. Biotechnol. 88 (2001) 21-35; A. Vassileva., et al., Effect of copy number on the expression levels of hepatitis B surface antigen in the methylotrophic yeast Pichia pastoris, Protein Expr. Purif. 21 (2001) 71-80;    3. The efficiency and strength of promoters, e.g., I. B. Sears., et al., A versatile set of vectors for constitutive and regulated gene expression in Pichia pastoris, Yeast 14 (1998) 783-90;    4. Translation signals, e.g., D. R. Cavener., et al., Eukaryotic start and stop translation sites, Nucleic Acids Res. 19 (1991) 3185-92;    5. Signal peptides, e.g., L. Briand., et al., Optimization of the production of a honeybee odorant-binding protein by Pichia pastoris, Protein Expr. Purif. 15 (1999) 362-9; Z. I. Crawford K., et al., Pichia secretory leader for protein expression, U.S. Pat. No. 6,107,057 (2000); R. J. Raemaekers., et al., Functional phytohemagglutinin (PHA) and Galanthus nivalis agglutinin (GNA) expressed in Pichia pastoris correct N-terminal processing and secretion of heterologous proteins expressed using the PHA-E signal peptide, Eur. J. Biochem. 265 (1999) 394-403; N. Koganesawa., et al., Construction of an expression system of insect lysozyme lacking thermal stability: the effect of selection of signal sequence on level of expression in the Pichia pastoris expression system, Protein Eng. 14 (2001) 705-10;    6. Processing and folding in the endoplasmic reticulum (ER) and Golgi, e.g., J. M. Kowalski., et al., Protein folding stability can determine the efficiency of escape from endoplasmic reticulum quality control, J. Biol. Chem. 273 (1998) 19453-8;    7. Extracellular secretion: e.g., D. Rossini., et al., Alberghina, In Saccharomyces cerevisiae, protein secretion into the growth medium depends on environmental factors, Yeast 9 (1993) 77-84; and    8. Protein turnover by proteolysis. e.g., J. M. Cregg., et al., Recombinant protein expression in Pichia pastoris, Mol. Biotechnol. 16 (2000) 23-52.
To overcome the problems encountered in protein expression, proper consideration of the influencing factors should be taken. A practical solution is to identify the major bottleneck of the production system, which in general is both host strain- and product-dependent.
Since the bottlenecks in producing different heterologous proteins remains to be case specific, a need in the art for techniques that will facilitate high-yield protein production in yeasts, including Pichia pastoris, of economic importance.
Term Definition
The following definitions are offered for purposes of illustration, not limitation, in order to assist with understanding the discussion that follows.
In accordance with the present invention there may be employed conventional molecular biology, microbiology, and recombinant DNA techniques within persons skilled in the art. Such techniques are explained fully in the literature.
A “polynucleotide” is a single- or double-stranded polymer of deoxyribonucleotide or ribonucleotide bases, the sequence of the polynucleotide is the actual sequence of the bases read from the 5′ to the 3′ end of the polymer. Polynucleotides include RNA and DNA, and may be isolated from natural sources, synthesized in vitro, or prepared from a combination of natural and synthetic molecules.
A “nucleic acid” or “nucleotide sequence” refers to the phosphate ester polymeric form of ribonucleosides (adenosine, guanosine, uridine or cytidine; “RNA molecules”) or deoxyribonucleosides (deoxyadenosine, deoxyguanosine, deoxythymidine, or deoxycytidine; “DNA molecules”) in either single stranded form, or a double-stranded helix. Double stranded DNA-DNA, DNA-RNA and RNA-RNA helices are possible. The term nucleic acid, and in particular DNA or RNA molecule, refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary or quaternary forms. Thus, this term includes double-stranded DNA found, inter alia, in linear or circular DNA molecules (e.g., restriction fragments), plasmids, and chromosomes. In discussing the structure of particular double-stranded DNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5′ to 3′ direction along the nontranscribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA). A “recombinant DNA” is a DNA molecule that has undergone a molecular biological manipulation.
A DNA “coding sequence” or an is a double-stranded DNA sequence which is transcribed and translated into a polypeptide in a cell in vitro or in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a start codon at the 5′ (amino) terminus and a translation stop codon at the 3′ (carboxyl) terminus. A coding sequence can include, but is not limited to, prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., mammalian) DNA, and even synthetic DNA sequences. If the coding sequence is intended for expression in a eukaryotic cell, a polyadenylation signal and transcription termination sequence will usually be located 3′ to the coding sequence.
An expression vector is a DNA molecule, linear or circular, that comprises a segment encoding a polypeptide of interest operably linked to additional segments that provide for its transcription. Such additional segments may include promoter and terminator sequences, and optionally one or more origins of replication, one or more selectable markers, an enhancer, a polyadenylation signal, and the like. Expression vectors are generally derived from plasmid or viral DNA, or may contain elements of both.
Transcriptional and translational control sequences are DNA regulatory sequences, such as promoters, enhancers, terminators, and the like, that provide for the expression of a coding sequence in a host cell e.g. in eukaryotic cells, polyadenylation signals are control sequences.
A “signal sequence” is a DNA sequence that encodes a polypeptide (a “signal peptide” that, as a component of a larger polypeptide, directs the larger polypeptide through a secreted pathway of a cell in which it is synthesized. The larger polypeptide is commonly cleaved to remove the signal peptide during transit through the secreted pathway.
The term “promoter” is used herein for its art-recognized meaning to denote a portion of a gene containing DNA sequences that provide for the binding of RNA polymerase and initiation of transcription. Promoter sequences are commonly, but not always, found in the 5′ non-coding regions of genes.
A chromosomal gene is rendered “non-functional” if the polypeptide that the gene encodes can no longer be expressed in a functional form. Such non-functionality of a gene can be induced by a wide variety of genetic manipulations or alterations as known in the art.
“Operably linked”, when referring to DNA segments, indicates that the segments are arranged so that they function in concert e.g. the transcription process takes place via the RNA-polymerase binding to the promoter segment and proceeding with the transcription through the coding segment until the polymerase stops when it encounters a transcription terminator segment.
As used herein the term “nucleic acid fragment” is intended to indicate any nucleic acid molecule of cDNA, genomic DNA, synthetic DNA or RNA origin. The term “fragment” is intended to indicate a nucleic acid segment which may be single- or double-stranded, and which may be based on a complete or partial naturally occurring nucleotide sequence encoding a polypeptide of interest. The fragment may optionally contain other nucleic acid segments.
The nucleic acid fragment of the invention encoding the polypeptide of the invention may suitably be of genomic or cDNA origin, for instance obtained by preparing a genomic or cDNA library and screening for DNA sequences coding for all or part of the polypeptide by hybridization using synthetic oligonucleotide probes in accordance with standard techniques.
Furthermore, the nucleic acid fragment may be of mixed synthetic and genomic, mixed synthetic and cDNA or mixed genomic and cDNA origin prepared by ligating fragments of synthetic, genomic or cDNA origin (as appropriate), the fragments corresponding to various parts of the entire nucleic acid fragment, in accordance with standard techniques. The nucleic acid fragment may also be prepared by polymerase chain reaction using specific primers, for instance as described in U.S. Pat. No. 4,683,202.
The term nucleic acid fragment may be synonymous with the term “expression cassette” when the nucleic acid fragment contains the control sequences necessary for expression of a coding sequence of the present invention.
The term “control sequences” is defined herein to include all components that are necessary or advantageous for expression of the coding sequence of the nucleic acid sequence. Each control sequence may be native or foreign to the nucleic acid sequence encoding the polypeptide. Such control sequences include, but are not limited to, a leader, a polyadenylation sequence, a propeptide sequence, a promoter, a signal sequence, and a transcription terminator. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals. The control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the nucleic acid sequence encoding a polypeptide.
The control sequence may be an appropriate promoter sequence, a nucleic acid sequence that is recognized by a host cell for expression of the nucleic acid sequence.
The promoter sequence contains transcription and translation control sequences that mediate the expression of the polypeptide. The promoter may be any nucleic acid sequence that shows transcriptional activity in the host cell of choice and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
The control sequence may also be a suitable transcription terminator sequence, a sequence recognized by a host cell to terminate transcription. The terminator sequence is operably linked to the 3′ terminus of the nucleic acid sequence encoding the polypeptide. Any terminator which is functional in the host cell of choice may be used in the present invention.
The control sequence may also be a polyadenylation sequence, a sequence which is operably linked to the 3′ terminus of the nucleic acid sequence and which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence which is functional in the host cell of choice may be used in the present invention.
The control sequence may also be a signal peptide-coding region, which codes for an amino acid sequence linked to the amino terminus of the polypeptide which can direct the expressed polypeptide into the cell's secreted pathway of the host cell. The 5′ end of the coding sequence of the nucleic acid sequence may inherently contain a signal peptide-coding region naturally linked in translation reading frame with the segment of the coding region which encodes the secreted polypeptide. Alternatively, the 5′ end of the coding sequence may contain a signal peptide-coding region which is foreign to that portion of the coding sequence which encodes the secreted polypeptide.
A foreign signal peptide-coding region may be required where the coding sequence does not normally contain a signal peptide-coding region. Alternatively, the foreign signal peptide coding region may simply replace the natural signal peptide coding region in order to obtain enhanced secretion of the polypeptide relative to the natural signal peptide coding region normally associated with the coding sequence. The signal peptide-coding region may be obtained from a glucoamylase or an amylase gene from an Aspergillus species, a lipase or proteinase gene from a Rhizomucor species, the gene for the alpha-factor from Saccharomyces cerevisiae, an amylase or a protease gene from a Bacillus species, or the calf preprochymosin gene. However, any signal peptide coding region capable of directing the expressed polypeptide into the secreted pathway of a host cell of choice may be used in the present invention.
The control sequence may also be a propeptide coding region, which codes for an amino acid sequence positioned at the amino terminus of a polypeptide. The resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is generally inactive and can be converted to mature active polypeptide by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide. The propeptide coding region may be obtained from the Bacillus subtilis alkaline protease gene (aprE), the Bacillus subtilis neutral protease gene (nprT), the Saccharomyces cerevisiae alpha-factor gene, or the Myceliophthora thermophilum laccase gene (WO 95/33836).
It may also be desirable to add regulatory sequences which allow the regulation of the expression of the polypeptide relative to the growth of the host cell. Examples of regulatory systems are those which cause the expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound. Regulatory systems in prokaryotic systems would include the lac, tac, and trp operator systems. Other examples of regulatory sequences are those which allow for gene amplification. In eukaryotic systems, these include the dihydrofolate reductase gene which is amplified in the presence of methotrexate, and the metallothionein genes which are amplified with heavy metals. In these cases, the nucleic acid sequence encoding the polypeptide would be placed in tandem with the regulatory sequence.
Examples of suitable promoters for directing the transcription of the nucleic acid fragments of the present invention, especially in a bacterial host cell, are the promoters obtained from the E. coli lac operon, the Streptomyces coelicolor agarase gene (dagA), the Bacillus subtilis levansucrase gene (sacB), the Bacillus subtilis alkaline protease gene, the Bacillus licheniformis alpha-amylase gene (amyL), the Bacillus stearothennophilus maltogenic amylase gene (amyM), the Bacillus amyloliquefaciens alpha-amylase gene (amyQ), the Bacillus amyloliquefaciens ban amylase gene, the Bacillus licheniformis penicillinase gene (penP), the Bacillus subtilis xylA and xylB genes, and the prokaryotic beta-lactamase gene, as well as the tac promoter.
The present invention also relates to expression vectors comprising a nucleic acid sequence of the present invention, a promoter, and transcriptional and translational stop signals. The various nucleic acid and control sequences described above may be joined together to produce a recombinant expression vector which may include one or more convenient restriction sites to allow for insertion or substitution of the nucleic acid sequence encoding the polypeptide at such sites. Alternatively, the nucleic acid sequence of the present invention may be expressed by inserting the nucleic acid sequence or a nucleic acid fragment comprising the sequence into an appropriate vector for expression. In creating the expression vector, the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression, and possibly secretion.
The expression vector may be any vector (e.g., a plasmid or virus) which can be conveniently subjected to recombinant DNA procedures and can bring about the expression of the nucleic acid sequence. The choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced. The vectors may be linear or closed circular plasmids. The vector may be an autonomously replicating vector, i.e., a vector which exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. The vector may contain any means for assuring self-replication. Alternatively, the vector may be one which, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. The vector system may be a single vector or plasmid or two or more vectors or plasmids which together contain the total DNA to be introduced into the genome of the host cell, or a transposon.
The vectors of the present invention preferably contain one or more “selectable markers” which permit easy selection of transformed cells. A selectable marker is a gene the product of which provides for biocide, antibiotic or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
Antibiotic selectable markers confer antibiotic resistance to such antibiotics as ampicillin, kanamycin, chloramphenicol, erythromycin, tetracycline, zeocine, neomycin, hygromycin or methotrexate.
The vectors, or smaller parts of the vectors, may be integrated into the host cell genome when introduced into a host cell. For chromosomal integration, the vector may rely on the nucleic acid sequence encoding the polypeptide or any other element of the vector for stable integration of the vector into the genome by homologous or nonhomologous recombination.
Alternatively, the vector may contain additional nucleic acid sequences for directing integration by homologous recombination into the genome of the host cell. The additional nucleic acid sequences enable the vector to be integrated into the host cell genome at a precise location(s) in the chromosome(s). To increase the likelihood of integration at a precise location, the integrational elements should preferably contain a sufficient number of nucleic acids, such as 100 to 1,500 base pairs, preferably 400 to 1,500 base pairs, and most preferably 800 to 1,500 base pairs, which are highly homologous with the corresponding target sequence to enhance the probability of homologous recombination. The integrational elements may be any sequence that is homologous with the target sequence in the genome of the host cell. Furthermore, the integrational elements may be non-encoding or encoding nucleic acid sequences.
The copy number of a vector, an expression cassette, an amplification unit, a gene or indeed any defined nucleotide sequence is the number of identical copies that are present in a host cell at any time. A gene or another defined chromosomal nucleotide sequence may be present in one, two, or more copies on the chromosome. An autonomously replicating vector may be present in one, or several hundred copies per host cell.
The present invention also relates to recombinant host cells, comprising a nucleic acid sequence of the invention, which are advantageously used in the recombinant production of the polypeptides. The term “host cell” encompasses any progeny of a parent cell which is not identical to the parent cell due to mutations that occur during replication.
The cell is preferably transformed with a vector comprising a nucleic acid sequence of the invention followed by integration of the vector into the host chromosome. “Transformation” means introducing a vector comprising a nucleic acid sequence of the present invention into a host cell so that the vector is maintained as a chromosomal integrant or as a self-replicating extra-chromosomal vector. Integration is generally considered to be an advantage as the nucleic acid sequence is more likely to be stably maintained in the cell. Integration of the vector into the host chromosome may occur by homologous or non-homologous recombination as described above.
The choice of a host cell will to a large extent depend upon the gene encoding the polypeptide and its source. The host cell may be a unicellular microorganism, e.g., a prokaryote, or a non-unicellular microorganism, e.g., an eukaryote.
The transformation of a host cell may, for instance, be effected by using competent cells, by electroporation, or by chemical such as lithium chloride method.
The transformed host cells described above are cultured in a suitable nutrient medium under conditions permitting the expression of the desired polypeptide, after which the resulting polypeptide is recovered from the cells, or the culture broth.
The medium used to culture the cells may be any conventional medium suitable for growing the host cells, such as minimal or complex media containing appropriate supplements. Suitable media are available from commercial suppliers or may be prepared according to published recipes (e.g. in catalogues of the American Type Culture Collection). The media are prepared using procedures known in the art.
If the polypeptide is secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If the polypeptide is not secreted, it is recovered from cell lysates. The polypeptide are recovered from the culture medium by conventional procedures including separating the host cells from the medium by centrifugation or filtration, precipitating the proteinaceous components of the supernatant or filtrate by means of a salt, e.g. ammonium sulphate, purification by a variety of chromatographic procedures, e.g. ion exchange chromatography, gelfiltration chromatography, affinity chromatography, or the like, dependent on the type of polypeptide in question.
The polypeptides may be detected using methods known in the art that are specific for the polypeptides. These detection methods may include use of specific antibodies, formation of an enzyme product, or disappearance of an enzyme substrate. For example, an enzyme assay may be used to determine the activity of the polypeptide.
The polypeptides of the present invention may be purified by a variety of procedures known in the art including, but not limited to, chromatography (e.g., ion exchange, affinity, hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g., preparative isdelectric focusing (IEF), differential solubility (e.g., ammonium sulfate precipitation), or extraction.
“GA” represents herein Rhizopus oryzae glucoamylase, one of the most important enzymes in the fermentation industry for its practical usage on saccharifying starch in alcohol production; see e.g., P. M. Coutinho. et al., Glucoamylase structural, functional, and evolutionary relationships, Proteins 29 (1997) 334-47.
“MSP” represents herein a modified signal peptide; a signal peptide is usually at the N terminus and normally absent from the mature protein. Normally refers to the sequence (about 20 amino acids) that interacts with signal recognition particle and directs the ribosome to the endoplasmic reticulum where co-translational insertion takes place. Signal peptides are highly hydrophobic but with some positively charged residues. The signal sequence is normally removed from the growing peptide chain by signal peptidase, a specific protease located on the cisternal face of the endoplasmic reticulum see e.g., Z. I. Crawford K., et al., Pichia secreted leader for protein expression, U.S. Pat. No. 6,107,057.
“Copy number” represents herein the number of copies of a given gene present in a cell or nucleus. An increase in gene dosage can result in the formation of higher levels of gene product, provided that the gene is not subject to autogenous regulation, see e.g., A. Vassileva., et al., Effect of copy number on the expression levels of hepatitis B surface antigen in the methylotrophic yeast Pichia pastoris, Protein Expr. Purif. 21 (2001) 71-80
“SEC4” represents herein a GTP-binding protein of the rab branch of the ras superfamily that functions as a nucleotide dependent switch on the surface of secreted vesicles. On the secreted vesicles Sec4 promotes the protein-protein interactions among the exocyst components, and the assembly of the exocyst complex eventually links the secreted vesicles to specific domains of the plasma membrane marked by another exocyst protein Sec3. see e.g., J. H. Toikkanen., et al., The beta subunit of the Sec61p endoplasmic reticulum translocon interacts with the exocyst complex in Saccharomyces cerevisiae, J. Biol. Chem. 278 (2003) 20946-53.