The present invention relates to the area of protein production, more particularly to yeast-derived regulatory regions and coding sequences for use in the production and secretion of heterologous proteins using a yeast host expression system.
Yeast host expression systems have successfully been used for production and secretion of heterologous proteins. Expression of a protein of interest can be enhanced with use of yeast-recognized regulatory regions. Increased yield of a heterologous protein of interest is commonly achieved with the use of yeast-derived signal and secretion leader peptide sequences. The use of native yeast secretion leaders reportedly improves direction of the protein of interest through the secretory pathway of the yeast host. Modifications to secretion leaders such as with truncation, may further improve yield.
Pichia pastoris has proven to be a desirable yeast host for production and secretion of high levels of some heterologous proteins. Additional yeast-derived regulatory regions and native yeast secretion leaders for use in heterologous protein expression in this and other yeast hosts are needed.
Compositions and methods for expression of proteins, more particularly heterologous proteins, using a yeast host cell as the expression system are provided. Compositions of the invention are the nucleotide sequences for the promoter and terminator regions for a novel Pichia pastoris gene, designated PpSEC10 gene, and the nucleotide sequences and respective amino acid sequences for the secretion leader and the mature Sec10p protein components of the precursor polypeptide encoded by this novel gene.
These compositions are useful in methods for expression and secretion of proteins, particularly heterologous proteins. Vectors having at least one copy of a DNA construct comprising at least one of the PpSEC10-derived regulatory and coding nucleotide sequences in proper reading frame with a nucleotide sequence encoding a protein of interest are constructed. A yeast host cell transformed with such a vector can then be cultured and screened for secretion of the protein of interest.
A mutant Pichia pastoris strain that has a disabled PpSEC10 gene and which does not express the Sec10p protein is also provided for use in the methods of the present invention. The Sec10p protein is normally expressed and secreted into the culture medium at high levels. Use of the mutant yeast strain is advantageous for protein production purposes as purification of the desired protein from the culture medium is simplified.
The Sec10p protein is useful for identifying culture conditions under which the PpSEC10 promoter drives transcription of a coding sequence of interest. In this manner, antibodies to the Sec10p protein are provided for detection of this protein in the culture medium. Kits for use in the methods of protein production and detection of Sec10p protein are also provided.
The present invention is directed to compositions and methods for expression and secretion of proteins, more particularly heterologous proteins, using a yeast host cell as the expression system. Compositions of the invention include isolated nucleotide sequences for the regulatory transcription initiation and termination regions of a novel Pichia pastoris gene, hereinafter designated the PpSEC10 gene, and the isolated nucleotide sequences and respective amino acid sequences for the secretion leader and for the mature Sec10p protein components of the precursor polypeptide encoded by this novel PpSEC10 gene. Variants and fragments of these PpSEC10-derived nucleotide and amino acid sequences are also encompassed by the present invention. By xe2x80x9cisolatedxe2x80x9d is intended purified either partially or substantially as well as encompassing the use of the PpSEC10-derived nucleotide or amino acid sequences in uses other than their natural setting, for example in chimeric constructions, expression vectors, or transformation plasmids.
The PpSEC10-derived compositions disclosed herein are useful in methods directed to isolation of homologous nucleotide sequences and to expression and secretion of proteins, particularly heterologous proteins, using a yeast host expression system. These methods and additional uses for these compositions are disclosed in detail below.
The novel PpSEC10 gene of the present invention encodes a precursor polypeptide that comprises a secretion leader and a polypeptide sequence for the mature form of a 10 kDa yeast-secreted protein designated the Sec10p protein. This precursor polypeptide represents the initial translation product of mRNA transcribed from the PpSEC10 gene. The PpSEC10 precursor polypeptide has some structural components that are typical of secreted proteins: a secretion leader with a hydrophobic N-terminal sequence that is characteristic of the secretion signal, a mature protein sequence, and two basic amino acids that are positioned at the C-terminus of the secretion leader and which directly precede the mature protein sequence. Dibasic residues are a common cleavage recognition sequence for processing proteases such as Kex2. The predicted molecular weight of the mature form of Sec10p based on the protein amino acid sequence is 10 kDa, while the secreted protein""s estimated weight based on SDS-PAGE mobility is 18 kDa, indicating Sec10p may be glycosylated.
Wild-type Pichia pastoris cells secrete high levels of the mature Sec10p protein following proteolytic processing of the precursor polypeptide to remove the secretion leader that directs movement of the mature Sec10p protein through the secretory pathway of the yeast cell. As disclosed below, manipulation of the nucleotide sequence encoding the Sec10p precursor polypeptide results in a mutant strain of Pichia pastoris that has a disabled PpSEC10 gene and which lacks expression of the Sec10p protein. This mutant strain is useful in methods for expression and secretion of heterologous proteins in a yeast host expression system.
The regulatory transcription initiation and termination nucleotide sequences for the PpSEC10 gene, the nucleotide sequences encoding the components of the precursor polypeptide and their respective amino acid sequences, and variants and fragments of these nucleotide and amino acid sequences are of particular interest for the purposes of this invention.
A plasmid designated pKC172 and containing the cloned PpSEC10 gene was deposited with the American Type Culture Collection, Rockville, Md., on Feb. 5, 1997(accession number 98315, CMCC 4714). A plasmid ppGen2 in E. coli containing the cloned PpSEC10 gene (SEQ ID NO: 17) was deposited on Jun. 6, 1997 (accession number 98450, CMCC 4741). This deposit will be maintained under the terms of the Budapest Treaty. The PpSEC10 regulatory elements and coding sequences can be identified as portions of the plasmid DNA sequence set forth in SEQ ID NO: 17 as follows: the PpSEC10 promoter is set forth as nt 1180-228:7; the PpSEC10 secretion leader coding sequence is set forth as nt 2288-2443; the Sec10p mature protein coding sequences is set forth as nt 2444-2746; and the PpSEC10 transcription terminator is set forth as nt 2747-3061. These nucleotide sequences, and any amino acid sequences encoded thereby, are set forth individually in the sequence listing as SEQ ID NOS: 2-7 as identified below.
The sequence of the polynucleotides contained within the deposited materials, as well as the amino acid sequences of the polypeptides encoded thereby are incorporated herein by reference and are controlling in the event of any conflict with the written description of sequences herein.
Nucleotide sequences for the native transcription initiation region, also referred to as the promoter, and for the native transcription termination region, also referred to as the terminator, for the Pichia pastoris PpSEC10 gene are set forth in SEQ ID NOS: 2 and 3, respectively. By xe2x80x9ctranscription initiation and termination regionsxe2x80x9d is intended regulatory regions that flank a nucleotide coding sequence and control transcription of that coding sequence. The PpSEC10 transcription initiation region, or promoter, comprises a TATAA box (nt 1035-1039 of SEQ ID NO: 2) that directs RNA polymerase II to initiate downstream (3xe2x80x2) RNA synthesis at the appropriate transcription initiation site for the PpSEC10 coding sequence. It is recognized that having identified the nucleotide sequence for the PpSEC10 promoter disclosed herein, it is within skill in the art to isolate and identify further regulatory elements, such as enhancers and the like, in the 5xe2x80x2 untranslated region positioned upstream from the promoter sequence identified herein.
Amino acid sequences for the components of the PpSEC10 precursor polypeptide and the corresponding nucleotide sequences encoding these components are also disclosed herein. Thus, the amino acid sequence for the native PpSEC10 secretion leader and its corresponding nucleotide sequence are set forth in SEQ ID NOS: 4 and 5, respectively. The amino acid sequence for the native Sec10p mature protein sequence and its corresponding nucleotide sequence are set forth in SEQ ID NOS: 6 and 7, respectively.
The PpSEC10 secretion leader corresponds to the N-terminal sequence of the precursor polypeptide encoded by the PpSEC10 gene. At its N-terminus is a secretion signal, which comprises about 15 to about 30 amino acid residues and is characterized by a hydrophobic core.
The PpSEC10 secretion leader terminates in two basic amino acids (Lys51 and Arg52, SEQ ID NO: 4), a comnmon cleavage recognition site for yeast proteases such as Kex2.
Fragments and variants of these native PpSEC10-derived regulatory and coding nucleotide sequences and of the native amino acid sequences for the secretion leader or mature Sec10p protein are also encompassed by the present invention. By xe2x80x9cfragmentxe2x80x9d is intended a portion of the regulatory or coding nucleotide sequence or a portion of the amino acid sequence. Fragments of a regulatory nucleotide sequence, i.e., the promoter or terminator, may retain their regulatory activity. Thus, for example, less than the entire PpSEC10 promoter sequence disclosed herein may be utilized to drive expression of an operably linked nucleotide sequence of interest, such as a nucleotide sequence encoding a heterologous protein. It is within skill in the art to determine whether such fragments decrease expression levels or alter the nature of expression, i.e., inducible or constitutive expression. Preferably at least about 200 nucleotides of a PpSEC10 promoter sequence will be used to drive expression of a coding sequence. Likewise, less than the entire PpSEC10 terminator may be utilized to terminate transcription of a coding sequence, with functional terminator fragments preferably comprising at least about 300 nucleotides. Fragments of a regulatory sequence that are useful as hybridization probes are preferably at least about 20 nucleotides in length, most preferably about 100 nucleotides in length.
With respect to coding sequences, fragments of a nucleotide sequence may encode polypeptide fragments that retain the biological activity of the native polypeptide, in this case the native PpSEC10 secretion leader or native mature Sec10p protein. Thus, a functional fragment of the PpSEC10 secretion leader directs movement of a mature protein of interest through the secretory pathway of a yeast cell. A functional fragment of the Sec10p protein binds to a Sec10p antibody as disclosed below. Fragments of a coding nucleotide sequence may range from at least about 20 nucleotides, about 24 nucleotides, about 50 nucleotides, about 100 nucleotides, and up to the entire nucleotide sequence encoding the PpSEC10 secretion leader or the mature Sec10p protein of the invention. Fragments of a coding nucleotide sequence that are useful as hybridization probes generally do not encode fragment polypeptides that retain biological activity of the native polypeptide.
Fragments of the invention include antisense nucleotide sequences used to decrease expression of the PpSEC10 gene. By xe2x80x9cantisense sequencexe2x80x9d is intended a DNA sequence that is in inverse orientation to the 5xe2x80x2 to 3xe2x80x2 normal orientation of that nucleotide sequence. When introduced into a cell, expression of the antisense sequence prevents normal expression of the corresponding nucleotide sequence that is in normal orientation. The antisense nucleotide sequence encodes an RNA transcript that is complementary to and capable of hybridizing to the endogenous mRNA produced by transcription of the DNA nucleotide sequence for the targeted gene. In this manner, production of the native protein encoded by the targeted gene is inhibited. For purposes of the present invention, antisense nucleotide sequences may be used to inhibit production of the Sec10p protein. Such antisense fragments may vary in length ranging from at least about 20 nucleotides, about 50 nucleotides, about 100 nucleotides, up to and including the entire coding sequence for the PpSEC10 gene.
By xe2x80x9cvariantsxe2x80x9d is intended substantially similar sequences. Thus, for nucleotide sequences, variants include those sequences that encode the PpSEC10 secretion leader or the mature Sec10p protein but that differ conservatively because of the degeneracy of the genetic code. These naturally occurring allelic variants can be identified with the use of well-known molecular biology techniques, such as polymerase chain reaction (PCR) and hybridization techniques as outlined below. Variant nucleotide sequences also include synthetically derived nucleotide sequences that have been generated, for example, by using site-directed, mutagenesis but which still encode the PpSEC10 secretion leader and mature Sec10p protein sequences disclosed in the present invention as discussed below. Generally, nucleotide sequence variants of the invention will have at least 70%, preferably at least 80%, more preferably about 90 to 95% or more, and most preferably about 98% or more sequence identity to the native nucleotide sequence.
With respect to the amino acid sequences for the secretion leader and the mature Sec10p protein, variants include those polypeptides that are derived from the native polypeptides by deletion (so-called truncation) or addition of one or more amino acids to the N-terminal and/or C-terminal end of the native polypeptide; deletion or addition of one or more amino acids at one or more sites in the native polypeptide; or substitution of one or more amino acids at one or more sites in the native polypeptide. Such variants may result from, for example, genetic polymorphisin or from human manipulation. Methods for such manipulations are generally known in the art.
For example, amino acid sequence variants of the polypeptide can be prepared by mutations in the cloned DNA sequence encoding the native PpSEC10 secretion leader or the mature Sec10p protein. Methods for mutagenesis and nucleotide sequence alterations are well known in the art. See, for example, Walker and Gaastra, eds. (1983) Techniques in Molecular Biology (MacMillan Publishing Company, New York); Kunkel (1985) Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et al. (1987) Methods Enzymol. 154:367-382; Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.); U.S. Pat. No. 4,873,192; and the references cited therein; herein incorporated by reference. Guidance as to appropriate amino acid substitutions that do not affect biological activity of the secretion leader or the mature Sec10p protein may be found in the model of Dayhoff et al. (1978) Atlas of Protein Sequence and Structure (Natl. Biomed. Res. Found., Washington, D.C.), herein incorporated by reference. Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be preferred. Examples of conservative substitutions include, but are not limited to, GlyAla, ValIleLeu, AspGlu, LysArg, AsnGln, and PheTrpTyr.
One such amino acid sequence variant of the PpSEC10 secretion leader is set forth in SEQ ID NO: 8. The corresponding nucleotide coding sequence is set forth in SEQ ID NO: 9.In this variant, the amino acid residue at position 19 is asparagine, as opposed to alanine in the native secretion leader.
In constructing variants of the PpSEC10 secretion leader or mature Sec10p protein, modifications to the nucleotide sequences encoding the variants will be made such that variant polypeptides continue to possess the desired activity. Obviously, any mutations made in the DNA encoding a variant polypeptide must not place the sequence out of reading frame and preferably will not create complementary regions that could produce secondary mRNA structure.
Modifications to the native nucleotide sequence encoding the secretion leader or variants thereof will not interfere with the hydrophobic nature of the translated PpSEC10 secretion leader or with the ability of the secretion leader to direct movement of a protein sequence through the yeast secretory pathway and subsequent secretion of the protein from the yeast host cell.
Amino acid sequence variants of the secretion leader include those variants resulting from modification of the C-terminal proteolytic processing site. Thus, the native Lys-Arg processing site may be changed to other yeast-recognized proteolytic sites such as Arg-Arg, Pro-Arg, Ala-Arg, and Thr-Arg.
Other amino acid sequence variants of the secretion leader may be obtained with truncation of the C-terminal end of the leader. In making such truncations, the leader should retain a functional secretion signal, including its hydrophobic core. Thus, a truncated form of the PpSEC10 leader preferably comprises a minimum of about the first 35 contiguous amino acids of the N-terminal end and retains a yeast-recognized processing site at its C-terminal end.
In those instances where glycosylation of a secretion leader would facilitate movement of a mature protein through the yeast secretory pathway, glycosylation sites may be added to the PpSEC10 secretion leader. In this manner, amino acid residues that provide glycosylation sites may be substituted in a conservative manner for other amino acids in the secretion leader, such as with replacement of the codons for Gln to encode Asn.
The nucleotide sequences of the invention can be optimized for enhanced expression in the yeast host of interest. That is, these nucleotide sequences can be synthesized using yeast preferred condons for improved expression. See for example, U.S. Pat. Nos. 5,219,759 and 5,602,034.
Thus the nucleotide sequences for the promoter and termination regions and the nucleotide sequences encoding the PpSEC10 secretion leader and the mature Sec10p protein include the native forms as well as fragments and variants thereof. Likewise, the PpSEC10 secretion leader and the mature Sec10p protein include the native forms as well as fragments and variants thereof. The variant nucleotide sequences and variant polypeptides will be substantially homologous and functionally equivalent to the native nucleotide sequences and native polypeptides, respectively. A variant of a native nucleotide sequence or native polypeptide is xe2x80x9csubstantially homologousxe2x80x9d to the native sequence or native polypeptide, respectively, when at least 70%, preferably at least 80%, more preferably about 90% to 95% or more, and most preferably when at least 98% of its nucleotide sequence for amino acid sequence, respectively, is identical to the native nucleotide sequence or native amino acid sequence. A variant may differ by as few as 1 to 10 amino acid residues, such as 6-10, as few as 5, as few as 4, 3, 2, or even 1 amino acid residue.
By xe2x80x9csequence identityxe2x80x9d is intended the same nucleotides or amino acid residues are found within the variant sequence and a reference sequence when a specified, contiguous segment of the nucleotide sequence or amino acid sequence of the variant is aligned and compared to the nucleotide sequence or amino acid sequence of the reference sequence. Methods for sequence alignment and for determining identity between sequences are well known in the art. See, for example, Ausubel et al., eds. (1995) Current Protocols in Molecular Biology, Chapter 19 (Greene Publishing and Wiley-Interscience, New York); and the ALIGN program (Dayhoff (1978) in Atlas of Protein Sequence and Structure5:Suppl. 3 (National Biomedical Research Foundation, Washington, D.C.). With respect to optimal alignment of two nucleotide sequences, the contiguous segment of the variant nucleotide sequence may have additional nucleotides or deleted nucleotides with respect to the reference nucleotide sequence. Likewise, for purposes of optimal alignment of two amino acid sequences, the contiguous segment of the variant amino acid sequence may have additional amino acid residues or deleted amino acid residues with respect to the reference amino acid sequence. The contiguous segment used for comparison to the reference nucleotide sequence, or reference amino acid sequence will comprise at least 20 contiguous nucleotides, or amino acid residues, and may be 30, 40, 50, 100, or more nucleotides or. amino acid residues. Corrections for increased sequence identity associated with inclusion of gaps in the variant""s nucleotide sequence or amino acid sequence can be made by assigning gap penalties.
When considering percentage of amino acid sequence identity, some amino acid residue positions may differ as a result of conservative amino acid substitutions, which do not affect properties of protein function. In these instances, percent sequence identity may be adjusted upwards to account for the similarity in conservatively substituted amino acids. Such adjustments are well known in the art. See, for example, Meyers and Miller (1988) Computer Applic. Biol. Sci. 4:11-17.
By xe2x80x9cfunctionally equivalentxe2x80x9d is intended that the variant nucleotide sequence defines a regulatory region or encodes an amino acid sequence for a polypeptide that has substantially the same function as the native regulatory region or native polypeptide. Hence, a variant of a nucleotide sequence for a PpSEC10 promoter will drive expression of an operably linked nucleotide sequence, while a variant of a nucleotide sequence for a PpSEC10 terminator will terminate expression of an operably linked nucleotide sequence. A variant of the nucleotide sequence encoding a PpSEC10 secretion leader will also encode a PpSEC10 secretion leader that directs movement of a mature protein sequence through the yeast""secretory pathway. Similarly, a variant of the nucleotide sequence encoding a Sec10p mature protein will also encode that mature protein. If the encoded PpSEC10 secretion leader or mature Sec10p protein is also a variant, it will possess substantially the same biological activity as the native PpSEC10 secretion leader or mature Sec10p protein, respectively. Functionally equivalent sequences of the present invention also encompass those fragments of the PpSEC10-derived regulatory nucleotide sequences, i.e., sequences for the promoter and terminator, and those fragments of the PpSEC10 secretion leader and Sec10p mature protein sequences, and variants thereof, that retain substantially the same function as the respective native sequence.
For example, a functionally equivalent fragment of a PpSEC10 promoter nucleotide sequence will drive expression of an operably linked nucleotide sequence. These fragments will comprise at least about 20 contiguous nucleotides, at least about 24 contiguous nucleotides, preferably at least about 50 contiguous nucleotides, more preferably at least about 75 contiguous nucleotides, even more preferably at least about 100 contiguous nucleotides, still more preferably at least about 200 contiguous nucleotides of the particular promoter nucleotide sequence disclosed herein. The nucleotides of such fragments will usually comprise the TATAA recognition sequence of the particular promoter sequence. Such fragments may be obtained by use of restriction enzymes to cleave the native PpSEC10 promoter nucleotide sequence disclosed herein; by synthesizing a nucleotide sequence from the native nucleotide sequence of the promoter; or may be obtained through the use of PCR technology. See particularly Mullis et al. (1987) Methods Enzymol. 155:335-350, and Erlich, ed. (1989) PCR Technology (Stockton Press, New York). Again, variants of these promoter fragments, such as those resulting from site-directed mutagenesis, are encompassed by the compositions of the present invention.
Methods are available in the art for determining functional equivalence. Promoter activity may be measured by Northern blot analysis. See, for example, Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.), herein incorporated by reference. Biological activity can be measured using assays specifically designed for measuring activity of a native polypeptide. Additionally, antibodies raised against the biologically active native Sec10p protein can be tested for their ability to bind to the functionally equivalent variant, where effective binding is indicative of a protein having a conformation similar to that of the native protein.
The PpSEC10-derived regulatory and coding nucleotide sequences of the invention, and fragments and variants thereof, can be used as probes for the isolation of corresponding homologous sequences in other organisms, more particularly other yeasts. In this manner, methods such as PCR, hybridization, and the like can be used to identify such sequences having substantial sequence identity to the sequences of the invention. See, for example, Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.) and Innis et al. (1990), PCR Protocols: A Guide to Methods and Applications (Academic Press, New York). Coding sequences isolated based on their sequence identity to the entire Pichia pastoris PpSEC10 gene regulatory and coding sequences set forth herein or to fragments and variants thereof are encompassed by the present invention.
In a PCR method, pairs of primers can be used in PCR reactions for amplification of DNA sequences from cDNA or genomic DNA extracted from any organism of interest. In addition, a single specific primer with a sequence corresponding to one of the nucleotide sequences disclosed herein can be paired with a primer having a sequence of the DNA vector in the cDNA or genomic libraries for PCR amplification of the sequences 5xe2x80x2 or 3xe2x80x2 to the nucleotide sequences disclosed herein. Similarly, nested primers may be used instead of a single specific primer for the purposes of the invention. Methods for designing PCR primers and PCR cloning are generally known in the art and are disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.). See also Ignis et al., eds. (1990) PCR Protocols: A Guide to Methods and Applications (Academic Press, New York).
In a hybridization method, all or part of a known nucleotide sequence can be used to screen cDNA or genomic libraries made from other organisms of interest. Methods for construction of such cDNA and genomic libraries are generally known in the art and are disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.). The so-called hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments, or other oligonucleotides, and may be labeled with a detectable group such as 32P, or any other detectable marker. Probes for hybridization can be made by labeling synthetic oligonucleotides based on the known nucleotide sequence of interest. Degenerate primers designed on the basis of conserved nucleotides or amino acid residues in the known nucleotide or encoded amino acid sequence can additionally be used. Preparation of probes for hybridization is generally known in the art and is disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.), hereby incorporated by reference.
Using hybridization techniques, all or part of the specific known PpSEC10-derived regulatory or coding nucleotide sequence is used as a probe that selectively hybridizes to other possible PpSEC10 regulatory or coding nucleotide sequences present in a population of cloned genomic DNA fragments or cDNA fragments (i.e.,genomic or cDNA libraries) from a chosen organism. To achieve specific hybridization under a variety of conditions, such probes include sequences that a unique and are preferably at least about 20 nucleotides in length, and most preferably at least about 100 nucleotides in length. This technique may be used to isolate other possible PpSEC10 regulatory or coding nucleotide sequences from a desired organism or as a diagnostic assay to determine the presence of a PpSEC10 regulatory or coding nucleotide sequence in an organism. Hybridization techniques include hybridization screening of plated DNA libraries (either plaques or colonies; see, for example, Innis et al., eds. (1990) PCR Protocols: A Guide to Methods and Applications (Academic Press, New York)).
Thus, in addition to the native nucleotide sequences and fragments and variants thereof, the isolated nucleotide sequences of the invention also encompass homologous DNA sequences identified and isolated from other organisms by hybridization with entire or partial sequences obtained from the Pichia pastoris PpSEC10-derived regulatory and coding nucleotide sequences of the invention or variants thereof. Conditions that will permit other DNA sequences to hybridize to the DNA sequences disclosed herein can be determined in accordance with techniques generally known in the art. For example, hybridization of such sequences may be carried out under conditions of reduced stringency, medium stringency, or high stringency conditions (e.g., conditions represented by a wash stringency of 35-40% Formamide with 5xc3x97Denhardt""s solution, 0.5% SDS, and 1xc3x97SSPE at 37xc2x0 C.; conditions represented by a wash stringency of 40-45% Formamide with 5xc3x97Denhardt""s solution, 0.5% SDS, and 1xc3x97SSPE at 42xc2x0 C.; and conditions represented by a wash stringency of 50% Formamide with 5xc3x97Denhardt""s solution, 0.5% SDS, and 1xc3x97SSPE at 42xc2x0 C., respectively. See Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York). In general, sequences that, are substantially homologous and hybridize to the reference DNA sequences disclosed herein will have at least 70-75% sequence identity, 80-85% sequence identity, and even 90-95% sequence identity to the reference PpSEC10 sequences of the present invention.
The novel PpSEC10 regulatory and coding nucleotide sequences disclosed herein, and variants and fragments thereof, find use in methods directed to production of proteins, more particularly heterologous proteins, in a yeast host cell. The PpSEC10 nucleotide sequences individually or in various combinations may be provided in recombinant DNA constructs for introduction into a yeast host cell. By xe2x80x9crecombinantxe2x80x9d is intended genetic engineering of DNA fragments, which are assembled into the DNA construct of interest. These DNA constructs comprise all of the elements necessary for expression and secretion of a protein of interest from a yeast host cell. Thus, the DNA constructs of the invention, when introduced into a yeast host cell, can be expressed within that yeast host cell. Each DNA construct is provided with a plurality of restriction sites for insertion of the nucleotide coding sequence of interest that will be under the transcriptional regulation of the regulatory regions of the DNA construct. The DNA construct may additionally contain selectable marker genes, such as the Pichia pastoris histidinol dehydrogenase (HIS4) gene, to facilitate selection of stably transformed cells.
Such a recombinant DNA construct comprises in proper reading frame a nucleotide sequence for a yeast-recognized promoter, a nucleotide coding sequence for a yeast-derived secretion leader fused in frame to a nucleotide coding sequence for a desired protein of interest, and a nucleotide sequence for a yeast-recognized transcription terminator. By xe2x80x9cin proper reading framexe2x80x9d is intended the individual nucleotide sequences are operably linked, and thus expression of the coding sequences is under the regulatory control of the yeast-recognized promoter and terminator sequences.
Expression of the coding sequences for the yeast secretion leader and the desired protein produces a hybrid precursor polypeptide, or so-called fusion protein. By xe2x80x9chybridxe2x80x9d precursor polypeptide is intended the coding sequence for the secretion leader is foreign to the coding sequence for the desired protein, and hence the two coding ""sequences are not natively expressed as a precursor polypeptide in the yeast host cell.
The hybrid precursor polypeptide comprises the necessary yeast-derived peptide sequences for movement of the desired protein sequence through the secretory pathway of the yeast host cell. Preferably the nucleotide sequence encoding the yeast secretion leader will terminate in a yeast-recognized processing site, such as a dibasic processing site such as Lys-Arg or Arg-Arg recognized in vivo by a Kex2 protease, such that the secretion leader is cleaved off of the secreted desired protein. One of skill in the art will recognize that the hybrid precursor polypeptide may contain an additional coding sequence for another protein of interest, such that the secreted protein itself is a fusion protein comprising two polypeptides joined by a peptide bond.
The distinguishing feature of the recombinant DNA constructs of the present invention is the inclusion, in proper reading frame,of at least one of the novel PpSEC10-derived nucleotide sequences disclosed herein. Thus, in addition to a nucleotide sequence encoding the protein of interest, a DNA construct of the present invention will further comprise a nucleotide sequence for the PpSEC10 promoter, a nucleotide sequence encoding the PpSEC10 secretion leader, and/or a nucleotide sequence for the PpSEC10 terminator, or a variant or fragment thereof.
By xe2x80x9cyeast-recognizedxe2x80x9d promoter and terminator sequences is intended regulatory regions that; are functional in the yeast host cell. In one preferred embodiment of the invention, the recombinant DNA construct contains a PpSEC10 promoter disclosed herein, more particularly the PpSEC10 promoter having the sequence set forth in SEQ ID NO: 2 or a variant or fragment thereof.
Alternatively, when the recombinant DNA construct contains at least one other PpSEC10 nucleotide sequence, another type of yeast-recognized promoter may be used. This promoter may be a constitutive or inducible promoter, and may be native or analogous or foreign or heterologous to the specific yeast host. Additionally, the promoter may be the natural sequence or alternatively a synthetic sequence. By xe2x80x9cforeignxe2x80x9d is intended that the promoter is not found in the native yeast of interest into which the DNA construct comprising the promoter is introduced.
Other suitable native yeast promoters include, but are not limited to the wild-type a-factor promoter and promoters for the glycolytic enzymes phosphoglucoisomerase, phosphofructokinase, phosphotrioseisomerase, phosphoglucomutase, enolase, pyruvate kinase (PyK), glyceraldehyde-3-phosphate dehydrogenase (GAP or GAPDH), and alcohol dehydrogenase (ADH) (EPO Publication No. 284,044). See, for example, EPO Publication Nos. 120,551 and 164,556.
Synthetic hybrid promoters consisting of the upstream activator sequence of one yeast promoter, which allows for inducible expression, and the transcription activation region of another yeast promoter also serve as functional promoters in a yeast host. Examples of hybrid promoters include ADH/GAP, where the inducible region of the ADH promoter is combined with the activation region of the GAP promoter (U.S. Pat. Nos. 4,876,197 and 4,880,734). Other hybrid promoters using upstream activator sequences of either the ADH2, GAL4, GAL10, or PHO5 genes combined with the transcriptional activation region of a glycolytic enzyme such as GAP or PyK are available in the art (EPO Publication No. 164,556); herein incorporated by reference.
Yeast-recognized promoters also include naturally occurring non-yeast promoters that bind yeast RNA polymerase and initiate transcription of the coding sequence. Such promoters are available in the art. See, for example, Cohen et al. (1980) Proc. Natl. Acad. Sci. USA 77:1078; Mercereau-Puigalon et al. (1980) Gene 11:163: Panthier et al. (1980) Curr. Genet. 2:109); Henikoff et al. (1981) Nature 283:8;35; and Hollenberg et al. (1981) Curr. Topics Microbiol. Immunol. 96:119; herein incorporated by reference.
The terminator of the recombinant DNA construct may be native with the promoter, or may be derived from another source, providing that it is recognized by the yeast host. Thus in one preferred embodiment, the terminator is a PpSEC10 terminator, more particularly the PpSEC10 terminator having the sequence set forth in SEQ ID NO: 3 or a variant or fragment thereof. In this embodiment, the promoter may be the PpSEC10 promoter of the invention, or the promoter may be one of the other promoters identified above. Alternatively, when at least one other. PpSEC10 nucleotide sequence is present in the DNA construct, the terminator may be another yeast-recognized terminator, such as those for the xcex1-factor protein (U.S. Pat. No. 4,870,008) and glycolytic enzymes mentioned above.
The DNA construct further comprises a nucleotide sequence encoding a yeast-derived secretion leader that serves to direct the polypeptide sequence for the protein of interest through the secretory pathway of the yeast host cell. Thus, in one preferred embodiment of the invention, this secretion leader is a PpSEC10 secretion leader, more particularly the PpSEC10 secretion leader set forth in SEQ ID NO: 4 or a variant or fragment thereof. Thus the DNA construct comprises a nucleotide sequence encoding this secretion leader, more particularly the nucleotide sequence set forth in SEQ ID NO: 5 or a sequence encoding a variant or fragment of the peptide sequence set forth in SEQ ID NO: 4. This particular DNA construct may further comprise a regulatory nucleotide sequence for a PpSEC10 promoter and/or terminator of the present invention.
Alternatively, if the DNA construct comprises at least one other PpSEC10 nucleotide sequence 6 of the invention, a yeast secretion leader derived from another yeast-secreted protein may be used to direct the polypeptide sequence for the protein of interest through the secretory pathway of the yeast host cell. Such a yeast-derived secretion leader may be a naturally occurring secretion leader comprising its native secretion signal, or the secretion leader may be a synthetic hybrid comprising a secretion signal derived from a different yeast-secreted protein. The yeast-secreted protein that serves as a source for the secretion leader may be foreign or native to the yeast host cell.
The secretion leader as defined herein comprises a functional secretion signal that is essential to bring about extracellular secretion of a protein from a yeast cell. In those instances where the secretion leader is a hybrid comprising a secretion signal other than the native signal, a number of secretion signals are well known in the art. Examples of secretion signals appropriate for the present invention include, but are not limited to, the secretion signal for xcex1-factor (see, for example, U.S. Pat. No. 5,602,034; Brake et al. (1984) Proc. Natl. Acad. Sci. USA 81:4642:4646); invertase (WO 84/01153); PHO5 (DK 3614/83); YAP3 (yeast aspartic protease 3; PCT Publication No. 95/02059); and BAR1 (PCT Publication No. 87/02670). Alternatively, the secretion signal may be determined from genomic or cDNA libraries using hybridization probe techniques available in the art (see Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.), or even synthetically derived (see, for example, WO 92/11378).
During entry into the ER, the secretion signal is cleaved off the precursor polypeptide at a processing site. The processing site can comprise any peptide sequence that is recognized in vivo by a yeast proteolytic enzyme. This processing site may be the naturally occurring processing site for the secretion signal. More preferably, the naturally occurring processing site will be modified, or the processing site will be synthetically derived, so as to be a preferred processing site. By xe2x80x9cpreferred processing sitexe2x80x9d is intended a processing site that is cleaved in vivo by a yeast proteolytic enzyme more efficiently than is the naturally occurring site. Examples of preferred processing sites include, but are not limited to, dibasic peptides, particularly any combination of the two basic residues Lys and Arg, that is Lys-Lys, Lys-Arg, Arg-Lys, or Arg-Arg, most preferably Lys-Arg. These sites are cleaved by the protease encoded by the KEX2 gene of Saccharomyces cerevisiae (see Fuller el al. Microbiology 1986:273-278) or the equivalent protease of other yeast species (see Julius et al. (1983) Cell 32:839-852). In the event that the Kex2 protease would cleave a site within the polypeptide sequence for the protein of interest, other preferred processing sites could be utilized such that the peptide sequence of interest remains intact (see, for example, Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.).
For purposes of the present invention, the secretion leader preferably comprises its native secretion signal, as in the case of the PpSEC10 leader. The xcex1-factor protein is another yeast-secreted protein that may serve as an alternative source of secretion leader comprising its native secretion signal. A number of genes encoding precursor xcex1-factor proteins have been cloned and their secretion leader peptide sequences identified. See, for example, Singh et al. (1983) Nucleic Acids Res. 11:4049-4063; Kurjan et al., U.S. Pat. No. 4,546,082; U.S. Pat. No. 5,010,182; herein incorporated by reference. xcex1-factor secretion leaders comprising their native secretion signals have been used to express heterologous proteins in yeast. See, for example, Elliott et al. (1983) Proc. Natl. Acad. Sci. USA 80:7080-7084;; Bitter et al. (1984) Proc. Natl. Acad. Sci. 81:5330-5334; Smith et al. (1985) Science 229:1219-1229; and U.S. Pat. Nos. 4,849,407 and 5,219,759; herein incorporated by reference.
The recombinant DNA constructs comprising at least one PpSEC10 nucleotide sequence, of the invention may contain at least one additional nucleotide sequence of interest to be cotransformed into the yeast host. Alternatively, the additional nucleotide sequences of interest can be provided on a recombinant DNA construct other than the one comprising the PpSEC10 sequence. Where appropriate, the nucleotide sequence encoding the hybrid precursor polypeptide and any additional nucleotide sequences of interest may be optimized for increased expression in the transformed yeast, as previously noted.
Additional sequence modifications are known to enhance expression of nucleotide coding sequences in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, exon-intron splice site signals, transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression. The G-C content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell. When possible, the nucleotide coding sequence is modified to avoid predicted hairpin secondary mRNA structures.
In preparing the recombinant DNA construct, the various nucleotide sequence fragments may be manipulated so as to provide for the sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the nucleotide fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous nucleotides, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved. See particularly Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.).
The recombinant DNA construct""s restriction site for inserting the coding sequence for the protein of interest is a nucleotide sequence that is not present within the particular promoter or transcription terminator selected. The protein coding sequence may be inserted into the DNA construct using standard recombinant DNA methods. The protein may be identical to a naturally occurring protein or may contain modifications to alter its physicochemical properties, such as stability, activity, affinity for a particular ligand or receptor, antigenicity, therapeutic utility, or ability to be secreted from the host cell. Thus the nucleotide sequence encoding the mature protein of interest may a variant or fragment as previously defined above.
The protein of interest may be encoded by an endogenous gene in the yeast host cell or may be a protein not normally found in the host cell. It may be the precursor polypeptide form of the protein, and hence contain the native secretion signal and/or secretion leader, or it may be the mature form of the protein. In those instances where the protein is the precursor polypeptide form, modification of the native secretion leader to terminate in a yeast recognized processing site may facilitate secretion of the mature form of the protein of interest in a biologically active, properly folded conformation. See the copending application entitled xe2x80x9cMethod for Expression of Heterologous Proteins in Yeast,xe2x80x9d U.S. patent application Ser. No. 08/989,251, filed Dec. 12, 1997.
The protein of interest may also be a fusion protein consisting of two or more protein fragments fused together by means of peptide bond. In this manner, the first protein segment may comprise at least 6, 8, 10, 12, or 15 contiguous amino acids from the Sec10p amino acid sequence shown in SEQ ID NO: 6, or may comprise up to the full-length amino acid sequence for the mature Sec10p protein. Techniques for making fusion proteins, either recombinantly or by covalently linking two,o protein segments, are well known in the art. Thus the nucleotide sequence encoding the protein of interest may comprise a coding sequence for a Sec10p protein, more particularly the sequence set forth in SEQ ID NO: 7, in proper reading frame with a nucleotide sequence encoding the second protein segment. The second protein segment may be a full-length protein or a protein fragment. The second protein or protein fragment may be labeled with a detectable marker, such as an antibody tag, or may be an enzyme that will generate a detectable product. Enzymes suitable for this purpose, such as xcex2-galactosidase, are well known in the art.
The protein of interest may be, for example, any protein of therapeutic or industrial use, including, but not limited to, a structural protein, an enzyme, a growth factor, a receptor for a ligand, an antibody, a hormone, a transport protein, a storage protein, a contractile protein, a cell differentiation factor, a repressor, a transcription factor, a cytokine, a haematopoietic factor, or a novel engineered protein. Illustrative proteins of interest include, but are not limited to, hormones and factors, such as insulin-like growth factor (IGF-I, IGF-II), platelet-derived growth factor (PDGF), growth hormone, somatomedins, epidermal growth factor (EGF), keratinocyte growth factor (KGF), fibroblast growth factor (FGF), nerve growth factor (NGF), TGF-beta, vascular endothelial cell growth factor (VEGF), luteinizing hormone, thyroid-stimulating hormone, epithelin precursor, epithelin 1, epithelin 2, oxytocin, insulin, vasopressin, renin, calcitonin, follicle-stimulating hormone, prolactin, erythropoietin (EPO), colony-stimulating factor (CSF), lymphokines such as interleukin-2, globins, immunoglobulins, interferons, enzymes, xcex2-endorphin, enkephalin, dynorphin, etc.
In a preferred embodiment, the protein of interest is insulin-like growth factor I (IGF-I). IGF-I, a member of the somatomedin family, has 70 amino acid residues and a molecular mass of approximately 7.5 kDa. See Ringerknecht (1978) J. Biol. Chem. 253:2769 and FEBS Lett. 89:283. For a review of IGF-I, see Humbel (1990) Eur J. Biochem. 190:445-462. The nucleotide sequence encoding IGF-I that is assembled as part of the DNA construct may be genomic, cDNA, or synthetic DNA. The genes encoding the native forms of IGF-I have been sequenced, and several variants are well known in the art.
Suitable variants: can be IGF-I fragments, analogues, and derivatives. By xe2x80x9cIGF-I fragmentxe2x80x9d is intended a protein consisting of only a part of the intact IGF-I sequence and structure, and can be a C-terminal deletion or N-terminal deletion of IGF-I. By xe2x80x9canaloguesxe2x80x9d is intended analogues of either IGF-I or an IGF-I fragment that comprise a native IGF-I sequence and structure having one or more amino acid substitutions, insertions, or deletions. Peptides having one or more peptoids (peptide mimics) are also encompassed by the term analogue (see International Publication No. WO 91/04282). By xe2x80x9cderivativesxe2x80x9d is intended any suitable modification of IGF-I, IGF-I fragments, or their respective analogues, such as glycosylation, phosphorylation, or other addition of foreign moieties, so long as the IGF-I activity is retained. Methods for making IGF-I fragments, analogues, and derivatives are available in the art. See generally U.S. Pat. Nos. 4,738,921, 5,158,875, and 5,077,276; International Publication Nos. WO 85/00831, WO 92/04363, WO 87/01038, and WO 89/05822; and European Patent Nos. EP 135094, EP 123228, and EP 128733; herein incorporated by reference. GF-I variants will generally have at least 70%, preferably at least 80%, more referably about 90% to 95% or more, and most preferably about 98% or more amino acid sequence: identity to the amino acid sequence of the reference IGF-I molecule. A variant may differ by as few as 10, as few as 5, as few as 4, 3, 2, or even 1 amino acid residue.
The art provides substantial guidance regarding the preparation and use of such IGF-I variants, as discussed further below. A fragment of IGF-I will generally include at least 10 contiguous amino acid residues of the full-length molecule, preferably, 15 contiguous amino acid residues of the full-length molecule, and most preferably 25 or more contiguous amino acid residues of full-length IGF-I. In preparing the IGF-I variants, one of skill in the art can readily determine which modifications to the native protein nucleotide or amino acid sequence will result in a variant that retains the activity of the native IGF-I protein. These will generally be conservative amino acid substitutions that preserve the charge of the substituted residue (e.g., aspartic acid for glutamic acid).
Several IGF-I variants are known in the art and include those described in, for example, Proc. Natl. Acad. Sci. USA 83 (1986) 4904-4907; Biochem. Biophys. Res. Commun. 149 (1987) 398-404; J. Biol. Chem. 263 (1988) 6233-6239; Biochem. Biophys. Res. Commun. 165 (1989) 766-771; Forsbert et al. (1990) Biochem. J. 271:357-363; U.S. Pat. Nos. 4,876,242 and 5,077,276; and International Publication Nos. WO 87/01038 and WO 89/05822. Representative variants include one with a deletion of Glu-3 of the mature molecule, a variant with up to 5 amino acids truncated from the N-terminus, a variant with a truncation of the first 3 N-terminal amino acids (referred to as des(1-3)-IGF-I, des-IGF-I, tIGF-I, or brain IGF), and a variant including the first 17 amino acids of the B chain of human insulin in place of the first 16 amino acids of human IGF-I.
Nucleotide""sequences encoding IGF-I are known in the art. The IGF-I coding sequence may be chemically synthesized, such as with the phosphoramidite procedure as described by Urea (1983) Proc. Natl. Acad. Sci. USA 80:7461, and according to the Dayhoff amino acid sequences. The human gene for IGF-I has been chemically synthesized as disclosed in Niwa et al. (1986) Annals New York Acad. Sci. 469:31-52 or Buell et al. (1985) Nucleic Acids Res. 13:1923-1938; herein incorporated by reference. Nucleotide sequences encoding IGF-I may also be obtained by transcription of messenger RNA corresponding to IGF-I into its complementary DNA and converting the latter into double-stranded cDNA. Alternatively, the nucleotide sequence encoding IGF-I may be directly obtained from a known vector comprising an IGF-I gene by using restriction enzyme digestion to remove the gene for subsequent insertion into the recombinant DNA construct of the present invention. Such vectors are known in the art, as, for example, the vectors disclosed in Niwa et al. (1986) Annals New York Acad. Sci. 469:31-52 and Buell et al. (1985) Nucleic Acids Res. 13:1923-1938. See also International Publication No. WO 97/12044, herein incorporated by reference.
For any given protein of interest, the protein coding sequence is located in the construct adjacent to the nucleotide sequence encoding the PpSEC10 secretion leader. Transcription of the nucleotides encoding the secretion leader and protein coding sequence thus results in a fusion protein. After proteolytic processing, the mature protein is secreted into the culture medium. Preferably, two basic amino acids separate the two coding sequences, so that the secretion leader may be cleaved from the desired protein by a protease such as Kex2. The PpSEC10 secretion leader of the present invention (SEQ ID NO: 4 and variants or fragments thereof) terminates in this type of dibasic processing site.
The DNA construct of the present invention can be ligated into a replicon (e.g., plasmid, cosinid, virus, mini-chromosome), thus forming an expression vector that is capable of autonomous DNA replication in vivo. Such autonomously replicating vectors comprise yeast autonomous replication sequences and 2xcexc-based vectors. Preferably the replicon will be a plasmid. Such a plasmid expression vector will be maintained in one or more replication systems, preferably two replications systems, one that allows for stable maintenance within a yeast host cell for expression purposes, and one that provides for stable propagation within a prokaryotic host for cloning purposes. Examples of such yeast-bacteria shuttle vectors include Yep24 (Botstein et al. (1979) Gene 8:17-24; pC1/1 (Brake et al. (1984) Proc. Natl. Acad. Sci. USA 81:4642-4646), and Yrp17 (Stnichomb et al. (1982) J. Mol. Biol. 158:157). For cloning purposes, the plasmid vector comprising a recombinant DNA construct assembled with PpSEC10 nucleotide sequences of the present invention may be introduced into suitable host cells using a variety of techniques which are available in the art. These techniques include, but are not limited to, transferrin-polycation-mediated DNA transfer, transfection with naked or encapsulated nucleic acids, liposome mediated cellular fusion, intracellular transportation of DNA-coated latex beads, protoplast fusion, viral infection, electroporation, calcium phosphate-mediated transfection, and lithium salt-mediated transformation.
Additionally, a plasmid expression vector may be a high or low copy number plasmid, the copy number generally ranging from about 1 to about 200. With high copy number yeast vectors, there will generally be at least 10, preferably at least 20, and usually not exceeding about 250 copies in a single host. Either a high or low copy number vector may be desirable, depending upon the effect of the vector and of expression of the protein of interest on the host. See, for example, Brake et al. (1984) Proc. Natl. Acad. Sci. USA 81:4642-4646.
More preferably the recombinant DNA construct is ligated into a plasmid vector that allows for integration of the construct into the yeast genome. Examples of such integrating vectors are known in the art. See, for example, Botstein et al. (1979) Gene 8:17-24. Use of integrating vectors maximizes the stability of foreign protein production in a yeast host cell (Romanos et al. (1992) Yeast 8:423-488). Such a vector further comprises two segments of yeast host DNA sequences. For example, the DNA construct may be flanked with homologous regions of a yeast gene, such as the Pichia pastoris HIS4 gene, so that the construct can be integrated into the yeast genome by means of homologous recombination. The vector is linearized with a restriction enzyme, and the linearized DNA stimulates single crossover-type integration with the yeast host cell DNA.
Yeast host cells harboring multiple integrated copies of a recombinant DNA construct of the present invention may be generated by methods well known in the art. At least two such approaches have been developed. The first relies upon identifying multicopy strains that arise naturally as a low percentage of transformed cell populations. In this manner, large numbers of transformants are screened for production levels of the protein of interest by SDS-polyacrylamide gel electrophoresis, immunoblotting, or screened for multiple copies of the foreign gene using colony dot-blot hybridization. Alternatively, multiple copies of the recombinant DNA construct are constructed within a single vector prior to transformation of the yeast host cells. See, for example, Cregg et al. (1993) Bio/Technology 11:905-910, for a review of these methods. When a single vector is constructed with multiple copies of a DNA construct of the present invention, it may contain about 3 copies, preferably about 6 copies, more preferably about 8 copies of a particular DNA construct. It is within skill in the art to determine the optimal number of DNA constructs comprising the PpSEC10 nucleotide sequences and coding sequence for a given protein of interest and for a given strain of yeast.
The yeast cell to be transformed with an expression vector comprising at least one copy of a recombinant DNA construct that includes at least one PpSEC10 nucleotide sequence and a coding sequence for a protein of interest can be any yeast cell. By xe2x80x9cyeastxe2x80x9d is intended ascosporogenous yeasts (Endomycetales), basidiosporogenous yeasts, and yeast belonging to the Fungi Imperfecti (Blastomycetes). The ascosporogenous yeasts are divided into two families, Spermophthoraqceae and Saccharomycetaceae. The later is comprised of four subfamilies, Schizosaccharomycoideae (e.g., genus Schizosaccharomyces), Nadsonioideae, Lipomycoideae, and Saccharomycoideae (e.g., genera Pichia, Kluyveromyces, and Saccharomyces). The basidiosporogenous yeasts include the genera Leucosporidium, Rhodosporidium, Sporidiobolus, Filobasidium, and Filobasidiella. Yeast belonging to the Fungi Imperfecti are divided into two families, Sporobolomycetaceae (e.g., genera Sporobolomyces, Bullera) and Cryptococcaceae (e.g., genus Candida). Since the classification of yeast may change in the future, for the purposes of this invention, yeast shall be defined as described in Skinner et al., eds. 1980) Biology and Activities of Yeast (Soc. App. Bacteriol. Symp. Series No. 9). In addition to the foregoing, those of ordinary skill in the art are presumably familiar with the biology of yeast and the manipulation of yeast genetics. See, for example, Bacila et al., eds. (1978) Biochemistry and Genetics of Yeast; Rose and Harrison, eds. (1987) The Yeasts (2nded.); Strathern et al; eds. (1981) The Molecular Biology of the Yeast Saccharomyces; herein incorporated by reference.
The selection of suitable yeast for the practice of the present invention is within the skill of the art. When selecting yeast hosts for expression, suitable hosts may include those shown to have, for example, good secretion capacity and low proteolytic activity. Yeast are generally available from a variety of sources, including the Yeast Genetic Stock Center, Department of Biophysics and Medical Physics, University of California, Berkeley, Calif.; and the American Type Culture Collection, Rockville, Md.
Of particular interest to the present invention are species within the genera Pichia, Kluyveromyces, Saccharomyces, Schizosaccharomyces, and Candida. Species of particular interest include Pichia pastoris, Kluyveromyces lactis, and the Saccharomyces species S. cerevisiae, S. carlsbergensis, S. diastaticus, S. douglasii, S. kluyveri, S. norbensis, and S. oviformis. 
In one embodiment of the invention, the yeast host undergoing transformation to produce the protein of interest is a mutant Pichia pastoris strain that has a disabled PpSEC10 gene in its genome. By xe2x80x9cdisabledxe2x80x9d is intended the wild-type gene has been genetically manipulated by man such that it does not express the wild-type PpSEC10 protein or expresses this protein at much reduced levels or in a form that is not capable of being secreted from the yeast cell. Absence of a secreted Sec10p protein or decreased production of this protein simplifies purification of a secreted protein of interest from the culture medium.
The mutant Pichia pastoris strain may be generated by a number of methods well known in the art. For example, the wild-type PpSEC10 gene sequence may be disabled by using site-directed mutagenesis methods so that the wild-type Sec10p protein is not transcribed, or if transcribed is not translated into a secretable Sec10p protein.
Alternatively, various portions of the PpSEC10 coding sequence can be deleted from the wild-type gene. It is within skill in the art to determine the size of deletion necessary to result in a disabled PpSEC10 gene. Thus, a disabled gene may result from deletion of a single nucleotide if such a deletion shifts the remaining coding sequence out of reading frame. Larger deletions can result in complete lack of expression of product. Alternatively, additional sequences can be inserted into the coding sequence to disrupt the reading frame of the gene of interest, causing a dramatically altered product to be expressed or resulting in the lack of expression of the product.
In one embodiment, a disabled gene may be prepared by inserting an auxotrophic marker gene into the PpSEC10 gene, thereby disrupting the PpSEC10 gene. Such auxotrophic marker genes can be selected from the Pichia or Saccharomyces HIS4 gene, the Pichia or Saccharomyces ARG4 genes, the Pichia or Saccharomyces URA3 genes, and the like.
In another embodiment of the invention, the PpSEC10 gene is disabled by replacement of the; wild-type PpSEC10 gene with a disabled PpSEC10 gene. Gene replacement is carried out, for example, by introducing the disabled PpSEC10 gene under transformation conditions suitable for the site-directed integration of the disabled gene into the genome of the yeast host at the specific locus of the wild-type PpSEC10 gene. Integration will replace or alter the host""s endogenous gene. One means of introducing the disabled gene into the target PpSEC10 locus of a yeast host is to transform the yeast host with a linear DNA fragment comprising the disabled gene and having ends homologous to the 5xe2x80x2 and 3xe2x80x2 ends of the target wild-type PpSEC10 gene. This will direct, upon transformation, that homologous recombination occur at the specific locus of the PpSEC10 gene.
Those of skill in the art recognize that host Pichia strains for transformation with the above-described modified gene can be wild-type Pichia cells, which upon transformation with the disabled PpSEC10 gene, could be screened for reduced expression of the PpSEC10 gene product. The host strains employed can have one or more defects therein to assist in the identification and selection of desired transformants.
Thus, mutant strains comprising disabled PpSEC10 genes may be obtained, for example, as described above, by transformation with DNA constructs comprising a disabled PpSEC10 gene. Alternatively, a Pichia pastoris cell may be transformed with an expression vector comprising a DNA construct with an antisense nucleotide sequence for the native PpSEC10 gene. Provided with the PpSEC10 coding sequence disclosed herein, one skilled in the art can readily prepare such DNA constructs using standard recombinant DNA techniques.
Methods of introducing exogenous DNA into yeast hosts are well known in the art. There are a number of ways to transform yeast. For example, spheroplast transformation is taught by Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 75:1919-1933 and Stinchcomb et al., EPO Publication No. 45,573; herein incorporated by reference. Transformants are grown in an appropriate nutrient medium, and, where appropriate, maintained under selective pressure to insure retention of endogenous DNA. Where expression is inducible, growth can be permitted of the yeast host to yield a high density of cells, and then expression is induced.
Methods of culturing yeast cells in both small and large volume cultures are well known in the art. For example, the yeast Pichia pastoris may be cultured at cell densities greater than 100 g/liter dry weight. At least 0.3 g/l of a desired protein may be produced. Preferably, 0.5, 1.0, 2.5, 8.0, or 12 g/l of the desired protein is produced. Small-scale cultures of yeast cells comprising a recombinant DNA construct of the present invention may be screened for those cells that produce larger amounts of the protein of interest. Such screening is routine in the art. Components of the culture medium, such as the carbon or nitrogen sources, may be varied to increase the amount of desired protein secreted. When the PpSEC10 promoter :is used to regulate expression of a protein of interest, the carbon source in the medium may be, for example, glucose, glycerol, or methanol. Secretion of Sec10p protein is enhanced by the addition of casein amino acids to the medium. Preferably, the medium contains a 2xc3x97yeast nitrogen base.
The secreted protein of interest can be harvested by any conventional means and purified from media components by chromatography, electrophoresis, dialysis, solvent-solvent extraction, and the like. For example, the protein can be purified by diluting the cell-free medium with sodium acetate and contacting the diluted medium with a cation exchange resin, followed by hydrophobic interaction chromatography. Using this method, the desired protein is typically greater than 95% pure. Further purification may be undertaken using methods well known in the art.
A kit is provided for expressing a protein of interest in a yeast host cell. The kit provides a yeast cell and an expression vector comprising a recombinant DNA construct of the present invention. The yeast cell may be any of the yeast cells listed above; preferably, however, the yeast cell is a Pichia pastoris cell. The DNA construct comprises at least one of the PpSEC10 nucleotide sequences of the present invention in addition to the coding sequence for a mature protein of interest. When the vector is introduced into the yeast cell, the protein of interest is expressed.
The invention further provides a method of identifying a culture condition under which a desired protein can be expressed under the control of the PpSEC10 promoter. The method comprises culturing a Pichia pastoris cell and detecting Sec10p protein in the culture medium. A culture condition under which Sec10p protein is secreted into the medium is a condition under which a desired protein can be expressed under the control of the PpSEC10 promoter. Components of the medium that may be varied include the identity and/or concentration of salts, trace elements, carbon source, and amino acids. Biotin concentration may also be varied.
The novel Sec10p protein in the culture medium may be detected, for example, by radioimmunoassay, using radiolabeled Sec10p antibodies. A preparation of antibodies that specifically binds to Sec10p may be obtained using an amino acid sequence for the Sec10p protein of the present invention, more particularly the sequence set forth in SEQ ID NO:6 or any variant or fragment thereof. This Sec10p protein is encoded by a PpSEC10 nucleotide sequence, more particularly the nucleotide sequence set forth in SEQ ID NO:7 or a sequence encoding a variant or fragment of the polypeptide sequence set forth in SEQ ID NO:6. The antibodies may be polyclonal or monoclonal. Techniques for raising polyclonal and. monoclonal antibodies are well known in the art. The antibodies bind specifically to Sec10p epitopes, preferably epitopes not present on other Pichia pastoris proteins. Typically, a minimum number of contiguous amino acids to encode. an epitope is 6, 8, or 10. However, more may be used, for example, at least 15, 25, or 50, especially to form epitopes that involve noncontiguous residues. Specific binding antibodies do not detect other proteins on Western blots of Pichia pastoris proteins or in immunocytochemical assays or provide a signal at least ten-fold lower than the signal provided with Sec10p amino acids. Antibodies that bind specifically to Sec10p proteins include those that bind to the mature Sec10p protein, variants or fragments thereof, Sec10p degradation products, Sec10p fusion proteins, or to alternatively spliced forms of Sec10p protein. In a preferred embodiment of the invention the antibodies immunoprecipitate Sec10p protein solution and react with Sec10p protein on Western blots of polyacrylamide gels.
Techniques for purifying Sec10p antibodies are available in the art. In a preferred embodiment, antibodies are affinity purified by passing antiserum over a column to which a Sec10p protein, fusion protein, or polypeptide is bound. The bound antibody is then: eluted, for example using a buffer with a high salt concentration. Any such technique may be chosen to achieve the preparation of the invention. Anti-Sec10p antibodies may also be used to detect Sec10p protein in Western blots of polyacrylamide gels containing proteins from the culture medium.