This invention relates to the production of hydroxylated triple helical proteins such as natural and synthetic collagens, natural and synthetic collagen fragments, and natural and synthetic collagen-like proteins, by recombinant DNA technology. In particular, the invention relates to a method for producing hydroxylated triple helical proteins in yeast host cells by introducing to a suitable yeast host cell, DNA sequences encoding the triple helical protein as well as prolyl 4-hydroxylase (P4H), in a manner wherein the introduced DNA sequences are stably retained and segregated by the yeast host cells.
The collagen family of proteins represents the most abundant protein in mammals, forming the major fibrous component of, for example, skin, bone, tendon, cartilage and blood vessels. Each collagen protein consists of three polypeptide chains (alpha chains) characterised by a (Gly-X-Y)n repeating sequence, which are folded into a triple helical protein conformation. Type I collagen (typically found in skin, tendon, bone and cornea) consists of two types of polypeptide chain termed xcex11(I) and xcex12(I) [i.e. xcex11(I)2xcex12(I)], while other collagen types such as Type II [xcex11(ll)3] and Type III [xcex11(III)3] have three identical polypeptide chains. These collagen proteins spontaneously aggregate to form fibrils which are incorporated into the extracellular matrix where, in mature tissue, they have a structural role and, in developing tissue, they have a directive role. The collagen fibrils, after cross-linking, are highly insoluble and have great tensile strength.
The ability of collagen to form insoluble fibrils makes them attractive for numerous medical applications including bioimplant production, soft tissue augmentation and wound/burn dressings. To date, most collagens approved for these applications have been sourced from animal sources, primarily bovine. While such animal-sourced collagens have been successful, there is some concern that their use risks serious immunogenicity problems and transmission of infective diseases and spongiform encephalopathies (e.g. bovine spongiform encephalopathy (BSE)). Accordingly, there is significant interest in the development of methods of production of collagens or collagen fragments by recombinant DNA technology. Further, the use of recombinant DNA technology is desirable in that it allows for the potential production of synthetic collagens and collagen fragments which may include, for example, exogenous biologically active domains (i.e. to provide additional protein function) and other useful characteristics (e.g. improved biocompatability and stability).
The in vivo biosynthesis of collagen proteins is a complex process involving many post translational events. A key event is the hydroxylation by the enzyme prolyl 4-hydroxylase (P4H) of prolyl residues in the Y-position of the repeating (Gly-X-Y)n sequences to 4-hydroxyproline. This hydroxylation has been found to be beneficial for nucleation of folding of triple helical proteins. For collagens, it is essential for stability at body temperature. Accordingly, the development of a commercially viable method for the production of recombinant collagen requires co-expression of P4H with the alpha chains. For mammalian host cells, co-expression of P4H will occur autonomously since these cells should naturally express P4H. However, for yeast host cells, which for reasons of cost, ease and efficiency are more attractive for expression of recombinant eukaryotic proteins, transformation with DNA sequences encoding P4H will also be required. Since P4H consists of xcex1 and xcex2 subunits of about 60 kDa and 60 kDa, yeast host cells for expression of recombinant collagen will require co-transformation with at least three exogenous DNA sequences (i.e., encoding an alpha chain, P4H xcex1 subunit and P4H xcex1 subunit) and stability problems would therefore be expected if cloned on three separate vectors or, alternatively, all on episomal type vector. Indeed, even under continuous selection pressure, many episomal type vectors suffer stability problems if they are large or are present at relatively low copy number. An object of the present invention is therefore to provide a method for expressing recombinant collagen and other triple helical proteins from yeast host cells wherein the introduced DNA sequences are stably retained and segregated independent of continuous selection pressure.
Thus, in a first aspect, the present invention provides a method of producing a hydroxylated triple helical protein in yeast comprising the steps of:
introducing to a suitable yeast host cell a first nucleotide sequence encoding P4H xcex1 subunit, a second nucleotide sequence encoding P4H xcex2 subunit and one or more product-encoding nucleotide sequences which encode(s) a polypeptide(s) or peptide(s) which, when hydroxylated, form the said hydroxylated triple helical protein, each of said first, second and product-encoding nucleotide sequences being operably linked to promoter sequences, and
culturing said yeast host cell under conditions suitable to achieve expression of said first, second and product-encoding nucleotide sequences to thereby produce said hydroxylated triple helical protein; wherein said method is characterised in that the step of introducing the first, second and product-encoding nucleotide sequences results in the said first, second and product-encoding nucleotide sequences, together with their respective operably linked promoter sequences, being borne on one or more replicable DNA molecules that are stably retained and segregated by said yeast host cell during said step of culturing.
In a second aspect, the present invention provides a yeast host cell capable of producing a hydroxylated triple helical protein, said yeast host cell including a first nucleotide sequence encoding P4H xcex1 subunit, a second nucleotide sequence encoding P4H xcex2 subunit and one or more product-encoding nucleotide sequences which encode(s) a polypeptide(s) or peptide(s) which, when hydroxylated, form the said hydroxylated triple helical protein, each of said first, second and product-encoding nucleotide sequences being operably linked to promoter sequences, and wherein said first, second and product-encoding nucleotide sequences, together with their respective operably linked promoter sequences, are borne on one or more replicable DNA molecules that are stably retained and segregated by said yeast host cell.
In a third aspect, the present invention provides a triple helical protein produced in accordance with the method of the first aspect.
In a fourth aspect, the present invention provides a biomaterial or therapeutic product comprising a triple helical protein produced in accordance with the method of the first aspect.
The method according to the invention requires that the first and second nucleotide sequences encoding the P4H xcex1 and xcex2 subunits and the product-encoding nucleotide sequences be introduced to a suitable yeast host cell in a manner such that they are borne on one or more DNA molecules that are stably retained and segregated by the yeast host cell during culturing. In this way, all daughter cells will include the first, second and product-encoding nucleotide sequences and thus stable and efficient expression of a hydroxylated triple helical protein product can be ensured throughout the culturing step and without the use of continuous selection pressure.
The method according to the invention can be achieved by; (i) integrating (e.g. by homologous recombination) one or more of the exogenous nucleotide sequences (i.e. one or more of the first, second and product-encoding nucleotide sequences) into one or more chromosome(s) of the yeast host cell, or (ii) including one or more of the exogenous nucleotide sequences within one or more vector(s) including a centromere (CEN) sequence(s). Alternatively, a combination of these techniques may be used or one or both of these techniques may be used in combination with the use of one or two high copy number plasmid(s) which include the remainder of the exogenous nucleotide sequences. For example, the first and second nucleotide sequences encoding the P4H xcex1 and xcex2 subunits may be integrated into a host chromosome while the product-encoding sequences may be included on vector(s) including a CEN sequence or on a high copy number vector(s).
Preferably, the method of the invention is achieved by including the exogenous nucleotide sequences within a vector(s) including a CEN sequence. Particularly preferred are the CEN sequence-including YAC (yeast artificial chromosome) vectors (Cohen et al., 1993) and pYEUra3 vectors (Clontech, Cat. No 6195-1). Other vectors including a CEN sequence may be generated by cloning a CEN sequence into any suitable expression vector.
Where one or more of the exogenous nucleotide sequences are included in a high copy number vector(s), it is preferred that the high copy number vector(s) is/are selected from those that may be present at 20 to 500 (preferably, 400 to 500) copies per host cell. Particularly preferred high copy number vectors are the YEp vectors.
The method according to the invention enables the production of hydroxylated triple helical proteins. The term xe2x80x9ctriple helical proteinxe2x80x9d is to be understood as referring to a homo or heterotrimeric protein consisting of a polypeptide(s) or peptide(s) which include at least a region having the general peptide formula: (Gly X Y)n in which Gly is glycine, X and Y represent the same or different amino acids (the identities of which may vary from Gly X Y triplet to Gly X Y triplet) but wherein X and Y are frequently proline which in the case of Y becomes, after modification, hydroxyproline (Hyp), and n is in the range of 2 to 1500 (preferably 10 to 350), which region forms, together with the same or similar regions of two other polypeptides or peptides, a triple helical protein conformation. The term therefore encompasses natural and synthetic collagens, natural and synthetic collagen fragments, and natural and synthetic collagen-like proteins (e.g macrophage scavenger receptor and lung-surfactant proteins) and as such includes any procollagen and collagen (e.g. Types I-XIX) with or without propeptides, globular domains and/or intervening non-collagenous sequences and, further, with or without native or variant amino acid sequences from human or other species. Synthetic collagen and fragments encompassed by the term xe2x80x9ctriple helical proteinxe2x80x9d may also include non-collagenous, non-triple helical domains at the amino and/or carboxy terminal ends or elsewhere.
Accordingly, product-encoding nucleotide sequence(s) suitable for use in the method according to the invention may be of great diversity. It is, however, preferred that the product-encoding nucleotide sequence(s) be selected from nucleotide sequences encoding natural collagens and fragments thereof, such as COL1A1 (D""Alessio et al., 1988; Westerhausen et al., 1991), COL1A2 (de Wet et al. 1987), COL2A1 (Cheah et al., 1985) and COL3A1 (Ala-Kokko et al. 1989) and fragments and combinations of these, and synthetic collagens and fragments thereof.
Product-encoding nucleotide sequence(s) which encode natural or collagen fragments may encode fragments which include or exclude the N-pro-peptide region, the N-telopeptide, the C-telopeptide or the C-propeptide or various combinations of these.
Product-encoding nucleotide sequences which encode synthetic collagens and fragments thereof, preferably encode a polypeptide(s) or peptide(s) of the general formula: (A)lxe2x80x94(B)mxe2x80x94(Gly X Y)nxe2x80x94(C)oxe2x80x94(D)p, in which Gly is glycine, X and Y represent the same or different amino acids, the identities of which may vary from Gly X Y triplet to Gly X Y triplet but wherein Y must bexe2x89xa7one proline, A and D are polypeptide or peptide domains which may or may not include triple helical forming (Gly X Y)n repeating sequences, B and C are intervening sequences which do not contain triple helical forming (Gly X Y)n repeating sequences, n is in the range of 2 to 1500 (preferably, 10 to 300) and l, m, o and p are each independently selected from 0 and 1.
The product-encoding nucleotide sequence(s) may include a sequence(s) encoding a secretion signal so that the polypeptide(s) or peptide(s) expressed from the product-encoding nucleotide sequence(s) are secreted.
Expression of the product-encoding nucleotide sequence(s) may be driven by constitutive yeast promoter sequences (e.g ADH1 (Hitzeman et al, 1981; Pihlajaniemi et al., 1987), HIS3 (Mahadevan and Struhl,1990), 786 (no author given, 1996 Innovations 5, 15) and PGK1 (Tuite et al, 1982), but more preferably, by inducible yeast promoter sequences such as GAL1-10 (Goff et al 1984), GAL7 (St. John and Davis, 1981), ADH2 (Thukral et al, 1991) and CUP1 (Macreadie et al, 1989).
The first and second nucleotide sequences encoding the P4H xcex1 and xcex2 subunits can be of any animal origin although they are preferably of avian or mammalian, particularly human, origin (Helaakoski et al., 1989). It is also envisaged that the first and second nucleotide sequences may originate from different species. In addition, the second nucleotide sequence encoding the P4H xcex2 subunit may include a sequence encoding an endoplasmic reticulum (ER) retention signal (e.g. HDEL (SEQ ID NO:13), KDEL (SEQ ID NO:42) or KEEL (SEQ ID NO:43)) with or without other target signals so as to allow expression of the P4H in the ER, cytoplasm or a target organelle or, alternatively, so as to be secreted.
Expression of the first and second nucleotide sequences may be driven by constitutive or inducible yeast promoter sequences such as those mentioned above. It is believed, however, that it is advantageous to achieve expression of the xcex1 and xcex2 subunits in a co-ordinated manner using same or different promoter sequences with same induction characteristics, but preferably by the use of a bidirectional promoter sequence. Accordingly, it is preferred that the first and second nucleotide sequences be expressed by the yeast GAL1-10 bidirectional promoter sequence, although other bidirectional promoter sequences would also be suitable.
Multiple copies of the first, second and/or product-encoding nucleotide sequences may be introduced to the yeast host cell (e.g. present on a YAC vector or integrated into a host chromosone). It may be particularly advantageous to provide the product-encoding nucleotide sequence(s) in multicopy and, accordingly. it may be preferred to introduce the product-encoding nucleotide sequence(s) on a high copy number plasmid (e.g. a YEp plasmid).
The introduced first, second and product-encoding nucleotide sequences may be borne on one or more stably retained and segregated DNA molecules. Where borne on more than one DNA molecule, the DNA molecules may be a combination of host chromosome(s) and/or CEN sequence-including vector(s) in combination with high copy number vector(s). Some specific examples of yeast host cells suitable for use in the method according to the invention, are transformed with the following DNA molecules:
1. YEp-P3+pYEUra3-xcex1xcex2,
2. YEp-P3+pYAC xcex1xcex2
3. YEpCEN-P3+pYEUra3xcex1xcex2
4. YEpCEN-P3+pYAC xcex1xcex2
5. pYAC-P3+pYAC xcex1xcex2
6. pYAC-P3+pYEUra3xcex1xcex2
7. pYACxcex1xcex2-P3;
wherein P3 represents a product-encoding nucleotide sequence(s), xcex1 and xcex2 represent, respectively, nucleotide sequences encoding the P4H xcex1 subunit and P4H xcex2 subunit, CEN represents an introduced centromere sequence. The pYEUra3 and pYAC vectors include CEN sequences.
Triple helical protein products produced in accordance with the method of the invention may be purified from the yeast host cell culture by techniques including standard chromatographic and precipitation techniques (Miller and Rhodes, 1982). For collagens, pepsin treatment and NaCl precipitation at acid and neutral pH may be used (Trelstad, 1982). Immunoaffinity chromatography can be used for constructs that contain appropriate recognition sequences, such as the Flag sequence which is recognised by an M1 or M2 monoclonal antibody, or a triple helical epitope, such as that recognised by the antibody 2G8/B1 (Glattauer et al., 1997).
Yeast host cells suitable for use in the method according to the invention may be selected from genus including, but not limited to, Saccharomyces, Kluveromyces, Schizosaccharomyces, Yarrowia and Pichia. Particularly preferred yeast host cells may be selected from S. cerevisiae, K. lactis, S. pombe, Y. lipolytica and P. pastoris. 
As indicated above, it is particularly preferred that the first, second and product-encoding nucleotide sequences be introduced to the yeast host cell by transformation with one or more YAC vectors. YAC vectors are linear DNA vectors which include yeast CEN sequences, at least one autonomous replication signal (e.g. ars) usually derived from yeast, and telomere ends (again, usually derived from yeast). They also generally include a yeast selectable marker such as URA3, TRP1, LEU2, or HIS3, and in some cases, an ochre suppressor (e.g. sup4-o) which allows for red/white selection in adenine requiring strains (i.e. the mutation of the adenine gene being due to a premature ochre stop codon). More commonly, two yeast selectable markers are included, one on each arm of the artificial chromosome (each arm separated by the CEN). This allows selection of only those transformed hosts containing YACs with introduced sequences of interest within the desired restriction cloning site. That is, correct insertion of the sequences of interest (e.g. an expression cassette) rejoins the two arms of the restricted YAC, thus rendering transformants prototrophic for both markers. YACs have been designed to allow for the introduction of large exogenous nucleotide sequences (i.e. of the order of 100 kb or more) into yeast host cells. The present inventors have hereinafter shown that such YACs may be used for the stable expression of multiple exogenous nucleotide sequences (e.g. nucleotide sequences encoding a natural collagen and both the xcex1 and xcex2 subunits of P4H).
In some embodiments of the invention, it may be preferred that one or more (but not all) of the first, second and product-encoding nucleotide sequences be introduced to the yeast host cell by transformation with one or two YEp vectors. YEp vectors carry all or part of the yeast 2xcexc plasmid with at least the ori of replication. They also include a yeast selectable marker such as HIS3, LEU2, TRP1, URA3, CUP1 or G418 resistance, and often also contain a separate ori, generally ColE1, and markers, such as ampicillin resistance, for manipulation in E.coli. They show high copy number, for example 20-400 per cell, and are generally efficiently segregated. Stability during cell division is dependent on the vector also containing the REP2/STB locus from the 2xcexc plasmid. However, stability is not as good as endogenous 2xcexc plasmid of the host, particularly when heterologous genes are induced for expression. Stability also declines with increasing plasmid size. (Wiseman, 1991).
The terms xe2x80x9ccomprisexe2x80x9d, xe2x80x9ccomprisesxe2x80x9d and xe2x80x9ccomprisingxe2x80x9d as used throughout the specification are intended to refer to the inclusion of a stated component or feature or group of components or features with or without the inclusion of a further component or feature or group of components or features.
The invention will now be described by way of reference to the following non-limiting examples and accompanying figures.