N-linked glycans, specific oligosaccharide structures attached to asparagine residues of glycoproteins, can contribute significantly to the properties of the protein and, in turn, to the properties of the organism. Plant proteins can carry N-linked glycans but in marked contrast to mammals only few biological processes are known to which they contribute.
Biogenesis of N-linked glycans begins with the synthesis of a lipid linked oligosaccharide moiety (Glc3Man9GlcNAc2-) which is transferred en bloc to the nascent polypeptide chain in the endoplasmic reticulum (ER). Through a series of trimming reactions by exoglycosidases in the ER and cis-Golgi compartments, the so-called “high mannose” (Man9GlcNAc2 to Man5GlcNAc2) glycans are formed. Subsequently, the formation of complex type glycans starts with the transfer of the first GlcNAc onto Man5GlcNAc2 by GnTI and further trimming by mannosidase II (ManII) to form GlcNAcMan3GlcNAc2. Complex glycan biosynthesis continues while the glycoprotein is progressing through the secretory pathway with the transfer in the Golgi apparatus of the second GlcNAc residue by GnTII as well as other monosaccharide residues onto the GlcNAcMan3GlcNAc2 under the action of several other glycosyl transferases.
Plants and mammals differ with respect to the formation of complex glycans (see FIG. 1, which compares the glycosylation pathway of glycoproteins in plants and mammals). In plants, complex glycans are characterized by the presence of β(1,2)-xylose residues linked to the Man-3 and/or an α(1,3)-fucose residue linked to GlcNAc-1, instead of an α(1,6)-fucose residue linked to the GlcNAc-1. Genes encoding the corresponding xylosyl (XylT) and fucosyl (FucT) transferases have been isolated [Strasser et al., “Molecular cloning and functional expression of beta1,2-xylosyltransferase cDNA from Arabidopsis thaliana,” FEBS Lett. 472:105 (2000); Leiter et al., “Purification, cDNA cloning, and expression of GDP-L-Fuc:Asn-linked GlcNAc alpha 1,3-fucosyltransferase from mung beans,” J. Biol. Chem. 274:21830 (1999)]. Plants do not possess β(1,4)-galactosyltransferases nor α(2,6)sialyltransferases and consequently plant glycans lack the β(1,4)-galactose and terminal α(2,6)NeuAc residues often found on mammalian glycans.
The final glycan structures are not only determined by the mere presence of enzymes involved in their biosynthesis and transport but to a large extent by the specific sequence of the various enzymatic reactions. The latter is controlled by discrete sequestering and relative position of these enzymes throughout the ER and Golgi, which is mediated by the interaction of determinants of the transferase and specific characteristics of the sub-Golgi compartment for which the transferase is destined. A number of studies using hybrid molecules have identified that the transmembrane domains of several glycosyltransferases, including that of β(1,4)galactosyltransferases, play a central role in their sub-Golgi sorting [Grabenhorst et al., J. Biol. Chem 274:36107 (1999); Colley, K., Glycobiology 7:1 (1997); Munro, S., Trends Cell Biol. 8:11 (1998); Gleeson, P. A., Histochem. Cell Biol. 109:517 (1998)].
Although plants and mammals have diverged a relatively long time ago, N-linked glycosylation seems at least partly conserved. This is evidenced by the similar though not identical glycan structures and by the observation that a mammalian GlcNAcTI gene complements a Arabidopsis mutant that is deficient in GlcNAcTI activity, and vice versa. The differences in glycan structures can have important consequences. For example, xylose and α(1,3)-fucose epitopes are known to be highly immunogenic and possibly allergenic in some circumstances, which may pose a problem when plants are used for the production of therapeutic glycoproteins. Moreover, blood serum of many allergy patients contains IgE directed against these epitopes but also 50% of non-allergic blood donors contains in their sera antibodies specific for core-xylose whereas 25% have antibodies for core-alpha 1,3-fucose (Bardor et al., 2002, in press, Glycobiology) (Advance Access published Dec. 17, 2002) which make these individuals at risk to treatments with recombinant proteins produced in plants containing fucose and/or xylose. In addition, this carbohydrate directed IgE in sera might cause false positive reaction in in vitro tests using plant extracts since there is evidence that these carbohydrate specific IgE's are not relevant for the allergenic reaction. In sum, a therapeutic failure with a glycoprotein produced in plants might be the result of accelerated clearance of the recombinant glycoprotein having xylose and/or fucose.
Accordingly, there is a need to better control glycosylation in plants, and particularly, glycosylation of glycoproteins intended for therapeutic use.
Definitions
To facilitate understanding of the invention, a number of terms as used in this specification are defined below.
The term “vector” refers to any genetic element, such as a plasmid, phage, transposon, cosmid, chromosome, retrovirus, virion, or similar genetic element, which is capable of replication when associated with the proper control elements and which can transfer gene sequences into cells and/or between cells. Thus, this term includes cloning and expression vehicles, as well as viral vectors.
The term “expression vector” as used herein refers to a recombinant DNA molecule containing a desired coding sequence (or coding sequences)—such as the coding sequence(s) for the hybrid enzyme(s) described in more detail below—and appropriate nucleic acid sequences necessary for the expression of the operably linked coding sequence in a particular host cell or organism. Nucleic acid sequences necessary for expression in prokaryotes usually include a promoter, an operator (optional), and a ribosome binding site, often along with other sequences. Eukaryotic cells are known to utilize promoters, enhancers, and termination and polyadenylation signals. It is not intended that the present invention be limited to particular expression vectors or expression vectors with particular elements.
The term “transgenic” when used in reference to a cell refers to a cell which contains a transgene, or whose genome has been altered by the introduction of a transgene. The term “transgenic” when used in reference to a cell, tissue or to a plant refers to a cell, tissue or plant, respectively, which comprises a transgene, where one or more cells of the tissue contain a transgene (such as a gene encoding the hybrid enzyme(s) of the present invention), or a plant whose genome has been altered by the introduction of a transgene. Transgenic cells, tissues and plants may be produced by several methods including the introduction of a “transgene” comprising nucleic acid (usually DNA) into a target cell or integration of the transgene into a chromosome of a target cell by way of human intervention, such as by the methods described herein.
The term “transgene” as used herein refers to any nucleic acid sequence which is introduced into the genome of a cell by experimental manipulations. A transgene may be an “endogenous DNA sequence,” or a “heterologous DNA sequence” (i.e., “foreign DNA”). The term “endogenous DNA sequence” refers to a nucleotide sequence which is naturally found in the cell into which it is introduced so long as it does not contain some modification (e.g., a point mutation, the presence of a selectable marker gene, or other like modifications) relative to the naturally-occurring sequence. The term “heterologous DNA sequence” refers to a nucleotide sequence which is ligated to, or is manipulated to become ligated to, a nucleic acid sequence to which it is not ligated in nature, or to which it is ligated at a different location in nature. Heterologous DNA is not endogenous to the cell into which it is introduced, but has been obtained from another cell. Heterologous DNA also includes an endogenous DNA sequence which contains some modification. Generally, although not necessarily, heterologous DNA encodes RNA and proteins that are not normally produced by the cell into which it is expressed. Examples of heterologous DNA include reporter genes, transcriptional and translational regulatory sequences, selectable marker proteins (e.g., proteins which confer drug resistance), or other similar elements.
The term “foreign gene” refers to any nucleic acid (e.g., gene sequence) which is introduced into the genome of a cell by experimental manipulations and may include gene sequences found in that cell so long as the introduced gene contains some modification (e.g., a point mutation, the presence of a selectable marker gene, or other like modifications) relative to the naturally-occurring gene.
The term “fusion protein” refers to a protein wherein at least one part or portion is from a first protein and another part or portion is from a second protein. The term “hybrid enzyme” refers to a fusion protein which is a functional enzyme, wherein at least one part or portion is from a first species and another part or portion is from a second species. Preferred hybrid enzymes of the present invention are functional glycosyltransferases (or portions thereof) wherein at least one part or portion is from a plant and another part or portion is from a mammal (such as human).
The term “introduction into a cell” or “introduction into a host cell” in the context of nucleic acid (e.g., vectors) is intended to include what the art calls “transformation” or “transfection” or “transduction.” Transformation of a cell may be stable or transient—and the present invention contemplates introduction of vectors under conditions where, on the one hand, there is stable expression, and on the other hand, where there is only transient expression. The term “transient transformation” or “transiently transformed” refers to the introduction of one or more transgenes into a cell in the absence of integration of the transgene into the host cell's genome. Transient transformation may be detected by, for example, enzyme-linked immunosorbent assay (ELISA) which detects the presence of a polypeptide encoded by one or more of the transgenes. Alternatively, transient transformation may be detected by detecting the activity of the protein (e.g., antigen binding of an antibody) encoded by the transgene (e.g., the antibody gene). The term “transient transformant” refers to a cell which has transiently incorporated one or more transgenes. In contrast, the term “stable transformation” or “stably transformed” refers to the introduction and integration of one or more transgenes into the genome of a cell. Stable transformation of a cell may be detected by Southern blot hybridization of genomic DNA of the-cell with nucleic acid sequences which are capable of binding to one or more of the transgenes. Alternatively, stable transformation of a cell may also be detected by the polymerase chain reaction (PCR) of genomic DNA of the cell to amplify transgene sequences. The term “stable transformant” refers to a cell which has stably integrated one or more transgenes into the genomic DNA. Thus, a stable transformant is distinguished from a transient transformant in that, whereas genomic DNA from the stable transformant contains one or more transgenes, genomic DNA from the transient transformant does not contain a transgene.
The term “host cell” includes both mammalian (e.g. human B cell clones, Chinese hamster ovary cells, hepatocytes) and non-mammalian cells (e.g. insect cells, bacterial cells, plant cells). In one embodiment, the host cells are mammalian cells and the introduction of a vector expressing a hybrid protein of the present invention (e.g TmGnTII-GalT) inhibits (or at least reduces) fucosylation in said mammalian cells.
The term “nucleotide sequence of interest” refers to any nucleotide sequence, the manipulation of which may be deemed desirable for any reason (e.g., confer improved qualities, use for production of therapeutic proteins), by one of ordinary skill in the art. Such nucleotide sequences include, but are not limited to, coding sequences of structural genes (e.g., reporter genes, selection marker genes, oncogenes, antibody genes, drug resistance genes, growth factors, and other like genes), and non-coding regulatory sequences which do not encode an mRNA or protein product, (e.g., promoter sequence, polyadenylation sequence, termination sequence, enhancer sequence, and other like sequences). The present invention contemplates host cells expressing a heterologous protein encoded by a nucleotide sequence of interest along with one or more hybrid enzymes.
The term “isolated” when used in relation to a nucleic acid, as in “an isolated nucleic acid sequence” refers to a nucleic acid sequence that is identified and separated from one or more other components (e.g., separated from a cell containing the nucleic acid, or separated from at least one contaminant nucleic acid, or separated from one or more proteins, one or more lipids) with which it is ordinarily associated in its natural source. Isolated nucleic acid is nucleic acid present in a form or setting that is different from that in which it is found in nature. In contrast, non-isolated nucleic acids are nucleic acids such as DNA and RNA which are found in the state they exist in nature. For example, a given DNA sequence (e.g., a gene) is found on the host cell chromosome in proximity to neighboring genes; RNA sequences, such as a specific mRNA sequence encoding a specific protein, are found in the cell as a mixture with numerous other mRNAs which encode a multitude of proteins. However, an isolated nucleic acid sequence comprising SEQ ID NO:1 includes, by way of example, such nucleic acid sequences in cells which ordinarily contain SEQ ID NO:1 where the nucleic acid sequence is in a chromosomal or extrachromosomal location different from that of natural cells, or is otherwise flanked by a different nucleic acid sequence than that found in nature. The isolated nucleic acid sequence may be present in single-stranded or double-stranded form. When an isolated nucleic acid sequence is to be utilized to express a protein, the nucleic acid sequence will contain at a minimum at least a portion of the sense or coding strand (i.e., the nucleic acid sequence may be single-stranded). Alternatively, it may contain both the sense and anti-sense strands (i.e., the nucleic acid sequence may be double-stranded).
As used herein, the term “purified” refers to molecules, either nucleic or amino acid sequences, that are removed from their natural environment, isolated or separated. An “isolated nucleic acid sequence” is therefore a purified nucleic acid sequence. “Substantially purified” molecules are at least 60% free, preferably at least 75% free, and more preferably at least 90% free, from other components with which they are naturally associated. The present invention contemplates both purified (including substantially purified) and unpurified hybrid enzyme(s) (which are described in more detail below).
As used herein, the terms “complementary” or “complementarity” are used in reference to nucleotide sequences related by the base-pairing rules. For example, the sequence 5′-AGT-3′ is complementary to the sequence 5′-ACT-3′. Complementarity can be “partial” or “total.” “Partial” complementarity is where one or more nucleic acid bases is not matched according to the base pairing rules. “Total” or “complete” complementarity between nucleic acids is where each and every nucleic acid base is matched with another base under the base pairing rules. The degree of complementarity between nucleic acid strands has significant effects on the efficiency and strength of hybridization between nucleic acid strands.
A “complement” of a nucleic acid sequence as used herein refers to a nucleotide sequence whose nucleic acids show total complementarity to the nucleic acids of the nucleic acid sequence. For example, the present invention contemplates the complements of SEQ ID NOS: 1, 3, 5, 9, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 40, 41 and 43.
The term “homology” when used in relation to nucleic acids refers to a degree of complementarity. There may be partial homology (i.e., partial identity) or complete homology (i.e., complete identity). A partially complementary sequence is one that at least partially inhibits a completely complementary sequence from hybridizing to a target nucleic acid and is referred to using the functional term “substantially homologous.” The inhibition of hybridization of the completely complementary sequence to the target sequence may be examined using a hybridization assay (Southern or Northern blot, solution hybridization and the like) under conditions of low stringency. A substantially homologous sequence or probe (i.e., an oligonucleotide which is capable of hybridizing to another oligonucleotide of interest) will compete for and inhibit the binding (i.e., the hybridization) of a completely homologous sequence to a target under conditions of low stringency. This is not to say that conditions of low stringency are such that non-specific binding is permitted; low stringency conditions require that the binding of two sequences to one another be a specific (i e., selective) interaction. The absence of non-specific binding may be tested by the use of a second target which lacks even a partial degree of complementarity (e.g., less than about 30% identity): in the absence of non-specific binding the probe will not hybridize to the second non-complementary target.
When used in reference to a double-stranded nucleic acid sequence such as a cDNA or genomic clone, the term “substantially homologous” refers to any probe which can hybridize to either or both strands of the double-stranded nucleic acid sequence under conditions of low stringency as described infra.
When used in reference to a single-stranded nucleic acid sequence, the term “substantially homologous” refers to any probe which can hybridize to the single-stranded nucleic acid sequence under conditions of low stringency as described infra.
The term “hybridization” as used herein includes “any process by which a strand of nucleic acid joins with a complementary strand through base pairing.” [Coombs J (1994) Dictionary of Biotechnology, Stockton Press, New York N.Y.]. Hybridization and the strength of hybridization (i.e., the strength of the association between the nucleic acids) is impacted by such factors as the degree of complementarity between the nucleic acids, stringency of the conditions involved, the Tm of the formed hybrid, and the G:C ratio within the nucleic acids.
As used herein, the term “Tm” is used in reference to the “melting temperature.” The melting temperature is the temperature at which a population of double-stranded nucleic acid molecules becomes half dissociated into single strands. The equation for calculating the Tm of nucleic acids is well known in the art. As indicated by standard references, a simple estimate of the Tm value may be calculated by the equation: Tm=81.5+0.41(% G+C), when a nucleic acid is in aqueous solution at 1 M NaCl [see e.g., Anderson and Young, Quantitative Filter Hybridization, in: Nucleic Acid Hybridization (1985)]. Other references include more sophisticated computations which take structural as well as sequence characteristics into account for the calculation of Tm.
Low stringency conditions when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 68° C. in a solution consisting of 5× SSPE (Saline, Sodium Phosphate, EDTA) (43.8 g/l NaCl, 6.9 g/l NaH2PO4.H2O and 1.85 g/l EDTA (Ethylenediaminetetracetic Acid), pH adjusted to 7.4 with NaOH), 0.1% SDS (Sodium dodecyl sulfate), 5× Denhardt's reagent [50× Denhardt's contains the following per 500 ml: 5 g Ficoll (Type 400, Pharmacia), 5 g BSA (Bovine Serum Albumin) (Fraction V; Sigma)] and 100 μg/l denatured salmon sperm DNA followed by washing in a solution comprising between 0.2× and 2.0× SSPE, and 0.1% SDS at room temperature when a DNA probe of about 100 to about 1000 nucleotides in length is employed.
High stringency conditions when used in reference to nucleic acid hybridization comprise conditions equivalent to binding or hybridization at 68° C. in a solution consisting of 5× SSPE, 1% SDS, 5× Denhardt's reagent and 100 μg/ml denatured salmon sperm DNA followed by washing in a solution comprising 0.1× SSPE, and 0.1% SDS at 68° C. when a probe of about 100 to about 1000 nucleotides in length is employed.
The term “equivalent” when made in reference to a hybridization condition as it relates to a hybridization condition of interest means that the hybridization condition and the hybridization condition of interest result in hybridization of nucleic acid sequences which have the same range of percent (%) homology. For example, if a hybridization condition of interest results in hybridization of a first nucleic acid sequence with other nucleic acid sequences that have from 50% to 70% homology to the first nucleic acid sequence, then another hybridization condition is said to be equivalent to the hybridization condition of interest if this other hybridization condition also results in hybridization of the first nucleic acid sequence with the other nucleic acid sequences that have from 50% to 70% homology to the first nucleic acid sequence.
When used in reference to nucleic acid hybridization the art knows well that numerous equivalent conditions may be employed to comprise either low or high stringency conditions; factors such as the length and nature (DNA, RNA, base composition) of the probe and nature of the target (DNA, RNA, base composition, present in solution or immobilized) and the concentration of the salts and other components (e.g., the presence or absence of formamide, dextran sulfate, polyethylene glycol) are considered and the hybridization solution may be varied to generate conditions of either low or high stringency hybridization different from, but equivalent to, the above-listed conditions.
The term “promoter,” “promoter element,” or “promoter sequence” as used herein, refers to a DNA sequence which when ligated to a nucleotide sequence of interest is capable of controlling the transcription of the nucleotide sequence of interest into mRNA. A promoter is typically, though not necessarily, located 5′ (i.e., upstream) of a nucleotide sequence of interest whose transcription into mRNA it controls, and provides a site for specific binding by RNA polymerase and other transcription factors for initiation of transcription.
Promoters may be tissue specific or cell specific. The term “tissue specific” as it applies to a promoter refers to a promoter that is capable of directing selective expression of a nucleotide sequence of interest to a specific type of tissue (e.g., petals) in the relative absence of expression of the same nucleotide sequence of interest in a different type of tissue (e.g., roots). Tissue specificity of a promoter may be evaluated by, for example, operably linking a reporter gene to the promoter sequence to generate a reporter construct, introducing the reporter construct into the genome of a plant such that the reporter construct is integrated into every tissue of the resulting transgenic plant, and detecting the expression of the reporter gene (e.g., detecting mRNA, protein, or the activity of a protein encoded by the reporter gene) in different tissues of the transgenic plant. The detection of a greater level of expression of the reporter gene in one or more tissues relative to the level of expression of the reporter gene in other tissues shows that the promoter is specific for the tissues in which greater levels of expression are detected. The term “cell type specific” as applied to a promoter refers to a promoter which is capable of directing selective expression of a nucleotide sequence of interest in a specific type of cell in the relative absence of expression of the same nucleotide sequence of interest in a different type of cell within the same tissue. The term “cell type specific” when applied to a promoter also means a promoter capable of promoting selective expression of a nucleotide sequence of interest in a region within a single tissue. Cell type specificity of a promoter may be assessed using methods well known in the art, e.g., immuno-histochemical staining. Briefly, tissue sections are embedded in paraffin, and paraffin sections are reacted with a primary antibody which is specific for the polypeptide product encoded by the nucleotide sequence of interest whose expression is controlled by the promoter. A labeled (e.g., peroxidase conjugated) secondary antibody which is specific for the primary antibody is allowed to bind to the sectioned tissue and specific binding detected (e.g., with avidin/biotin) by microscopy.
Promoters may be constitutive or regulatable. The term “constitutive” when made in reference to a promoter means that the promoter is capable of directing transcription of an operably linked nucleic acid sequence in the absence of a stimulus (e.g., heat shock, chemicals, light, or similar stimuli). Typically, constitutive promoters are capable of directing expression of a transgene in substantially any cell and any tissue. In contrast, a “regulatable” promoter is one which is capable of directing a level of transcription of an operably linked nuclei acid sequence in the presence of a stimulus (e.g. heat shock, chemicals, light, or similar stimuli) which is different from the level of transcription of the operably linked nucleic acid sequence in the absence of the stimulus.
The terms “infecting” and “infection” with a bacterium refer to co-incubation of a target biological sample, (e.g., cell, tissue, plant part) with the bacterium under conditions such that nucleic acid sequences contained within the bacterium are introduced into one or more cells of the target biological sample.
The term “Agrobacterium” refers to a soil-borne, Gram-negative, rod-shaped phytopathogenic bacterium which causes crown gall; The term “Agrobacterium” includes, but is not limited to, the strains Agrobacterium tumefaciens, (which typically causes crown gall in infected plants), and Agrobacterium rhizogens (which causes hairy root disease in infected host plants). Infection of a plant cell with Agrobacterium generally results in the production of opines (e.g., nopaline, agropine, octopine) by the infected cell. Thus, Agrobacterium strains which cause production of nopaline (e.g., strain LBA4301, C58, A208) are referred to as “nopaline-type” Agrobacteria; Agrobacterium strains which cause production of octopine (e.g., strain LBA4404, Ach5, B6) are referred to as “octopine-type” Agrobacteria; and Agrobacterium strains which cause production of agropine (e.g., strain EHA105, EHA101, A281) are referred to as “agropine-type” Agrobacteria. 
The terms “bombarding, “bombardment,” and “biolistic bombardment” refer to the process of accelerating particles towards a target biological sample (e.g., cell, tissue, plant part—such as a leaf, or intact plant) to effect wounding of the cell membrane of a cell in the target biological sample and/or entry of the particles into the target biological sample. Methods for biolistic bombardment are known in the art (e.g., U.S. Pat. Nos. 5,584,807 and 5,141,131, the contents of both are herein incorporated by reference), and are commercially available (e.g., the helium gas-driven microprojectile accelerator (PDS-1000/He) (BioRad).
The term “microwounding” when made in reference to plant tissue refers to the introduction of microscopic wounds in that tissue. Microwounding may be achieved by, for example, particle bombardment as described herein. The present invention specifically contemplates schemes for introducing nucleic acid which employ microwounding.
The term “organism” as used herein refers to all organisms and in particular organisms containing glycoproteins with n-linked glycans.
The term “plant” as used herein refers to a plurality of plant cells which are largely differentiated into a structure that is present at any stage of a plant's development. Such structures include, but are not limited to, a fruit, shoot, stem, root, leaf, seed, flower petal, or similar structure. The term “plant tissue” includes differentiated and undifferentiated tissues of plants including, but not limited to, roots, shoots, leaves, pollen, seeds, tumor tissue and various types of cells in culture (e.g., single cells, protoplasts, embryos, callus, protocorm-like bodies, and other types of cells). Plant tissue may be in planta, in organ culture, tissue culture, or cell culture. Similarly, “plant cells” may be cells in culture or may be part of a plant.
Glycosyltransferases are enzymes that catalyze the processing reactions that determine the structures of cellular oligosaccharides, including the oligosaccharides on glycoproteins. As used herein, “glycosyltransferase” is meant to include mannosidases, even though these enzymes trim glycans and do not “transfer” a monosaccharide. Glycosyltransferases share the feature of a type II membrane orientation. Each glycosyltransferase is comprised of an amino terminal cytoplasmic tail (shown for illustration purposes below as a made up of a string of amino acids arbitrarily labeled “X”—without intending to suggest the actual size of the region), a signal anchor domain (shown below as made up of a string of amino acids labeled “H” for hydrophobic—without intending to suggest the actual size of the domain and without intending to suggest that the domain is only made up of hydrophobic amino acids) that spans the membrane (referred to herein as a “transmembrane domain”), followed by a luminal stem (shown below as made up of a string of amino acids arbitrarily labeled “S”—without intended to suggest the actual size of the region) or stalk region, and a carboxy-terminal catalytic domain (shown below as made up of a string of amino acids arbitrarily labeled “C”—without intending to suggest the actual size of the domain:NH2—XXXXXXXHHHHHHHHSSSSSSSSCCCCCCCCCollectively, The Cytoplasmic Tail-Transmembrane-Stem Region or “CTS” (which has been underlined in the above schematic for clarity) can be used (or portions thereof) in embodiments contemplated by the present invention wherein the catalytic domain is exchanged or “swapped” with a corresponding catalytic domain from another molecule (or portions of such regions/domains) to create a hybrid protein.
For example, in a preferred embodiment, the present invention contemplates nucleic acid encoding a hybrid enzyme (as well as vectors containing such nucleic acid, host cells containing such vectors, and the hybrid enzyme itself), said hybrid enzyme comprising at least a portion of a CTS region [e.g., the cytoplasmic tail (“C”), the transmembrane domain (“T”), the cytoplasmic tail together with the transmembrane domain (“CT”), the transmembrane domain together with the stem (“TS”), or the complete CTS region] of a first glycosyltransferase (e.g. plant glycosyltransferase) and at least a portion of a catalytic region of a second glycosyltransferase (e.g. mammalian glycosyltransferase). To create such an embodiment, the coding sequence for the entire CTS region (or portion thereof) may be deleted from nucleic acid coding for the mammalian glycosyltransferase and replaced with the coding sequence for the entire CTS region (or portion thereof) of a plant glycosyltransferase. On the other hand, a different approach might be taken to create this embodiment; for example, the coding sequence for the entire catalytic domain (or portion thereof) may be deleted from the coding sequence for the plant glycosyltransferase and replaced with the coding sequence for the entire catalytic domain (or portion thereof) of the mammalian glycosyltransferase. In such a case, the resulting hybrid enzyme would have the amino-terminal cytoplasmic tail of the plant glycosyltransferase linked to the plant glycosyltransferase transmembrane domain linked to the stem region of the plant glycosyltransferase in the normal manner of the wild-type plant enzyme—but the stem region would be linked to the catalytic domain of the mammalian glycosyltransferase (or portion thereof).
It is not intended that the present invention be limited only to the two approaches outlined above. Other variations in the approach are contemplated. For example, to create nucleic acid encoding a hybrid enzyme, said hybrid enzyme comprising at least a portion of a transmembrane region of a plant glycosyltransferase and at least a portion of a catalytic region of a mammalian glycosyltransferase, one might use less than the entire coding sequence for the CTS region (e.g., only the transmembrane domain of the plant glycosytransferase, or the complete cytoplasmic tail together with all or a portion of the transmembrane domain, or the complete cytoplasmic tail together with all of the transmembrane domain together with a portion of the stem region). One might delete the mammalian coding sequence for the entire cytoplasmic tail together with the coding sequence for the transmembrane domain (or portion thereof)—followed by replacement with the corresponding coding sequence for the cytoplasmic tail and transmembrane domain (or portion thereof) of the plant glycosyltransferase. In such a case, the resulting hybrid enzyme would have the stem region of the mammalian glycosyltransferase linked to the plant glycosyltransferase transmembrane domain (or portion thereof) which in turn would be linked to the amino-terminal cytoplasmic tail of the plant glycosyltransferase, with the stem region being linked to the catalytic domain of the mammalian glycosyltransferase (i.e. two of the four regions/domains would be of plant origin and two would be of mammalian origin).
In other embodiments, the present invention contemplates nucleic acid encoding a hybrid enzyme (along with vectors, host cells containing the vectors, plants—or plant parts—containing the host cells), said hybrid enzyme comprising at least a portion of an amino-terminal cytoplasmic tail of a plant glycosyltransferase and at least a portion of a catalytic region of a mammalian glycosyltransferase. In this embodiment, the hybrid enzyme encoded by the nucleic acid might or might not contain other plant sequences (e.g., the transmembrane domain or portion thereof, the stem region or portion thereof). For example, to create such an embodiment, the coding sequence for the entire cytoplasmic tail (or portion thereof) may be deleted from nucleic acid coding for the mammalian glycosyltransferase and replaced with the coding sequence for the entire cytoplasmic domain (or portion thereof) of a plant glycosyltransferase. In such a case, the resulting hybrid enzyme would have the amino-terminal cytoplasmic tail (or portion thereof) of the plant glycosyltransferase linked to the mammalian glycosyltransferase transmembrane domain, which in turn is linked to stem region of the mammalian glycosyltransferase, the stem region being linked to the catalytic domain of the mammalian glycosyltransferase. On the other hand, a different approach might be taken to create this embodiment; for example, the coding sequence for the entire catalytic domain (or portion thereof) may be deleted from the coding sequence for the plant glycosyltransferase and replaced with the coding sequence for the entire catalytic domain (or portion thereof) of the mammalian glycosyltransferase. In such a case, the resulting hybrid enzyme would have the amino-terminal cytoplasmic tail of the plant glycosyltransferase linked to the plant glycosyltransferase transmembrane domain linked to the stem region of the plant glycosyltransferase in the normal manner of the wild-type plant enzyme—but the stem region would be linked to the catalytic domain of the mammalian glycosyltransferase (or portion thereof).
In the above discussion, the use of the phrase “or portion thereof” was used to expressly indicate that less than the entire region/domain might be employed in the particular case (e.g., a fragment might be used). For example, the cytoplasmic tail of glycosyltransferases ranges from approximately 5 to 50 amino acids in length, and more typically 15 to 30 amino acids, depending on the particular transferase. A “portion” of the cytoplasmic tail region is herein defined as no fewer than four amino acids and can be as large as up to the full length of the region/domain less one amino acid. It is desired that the portion function in a manner analogous to the full length region/domain—but need not function to the same degree. For example, to the extent the full-length cytoplasmic tail functions as a Golgi retention region or ER retention signal, it is desired that the portion employed in the above-named embodiments also function as a Golgi or ER retention region, albeit perhaps not as efficiently as the full-length region.
Similarly, the transmembrane domain is typically 15-25 amino acids in length and made up of primarily hydrophobic amino acids. A “portion” of the transmembrane domain is herein defined as no fewer than ten amino acids and can be as large as up to the full length of the region/domain (for the particular type of transferase) less one amino acid. It is desired that the portion function in a manner analogous to the full length region/domain—but need not function to the same degree. For example, to the extent the full-length transmembrane domain functions as the primary Golgi retention region or ER retention signal, it is desired that the portion employed in the above-named embodiments also function as a Golgi or ER retention region, albeit perhaps not as efficiently as the full-length region. The present invention specifically contemplates conservative substitutions to create variants of the wild-type transmembrane domain or portions thereof. For example, the present invention contemplates replacing one or more hydrophobic amino acids (shown as “H” in the schematic above) of the wild-type sequence with one or more different amino acids, preferably also hydrophobic amino acids.
A portion of the catalytic domain can be as large as the full length of the domain less on amino acid. Where the catalytic domain is from a beta1,4-galactosyltransferase, it is preferred that the portion include at a minimum residues 345-365 which are believed to be involved in the conformation conferring an oligosaccharide acceptor binding site (it is preferred that the portion include this region at a minimum and five to ten amino acids on either side to permit the proper conformation).
The present invention also includes synthetic CTS regions and portions thereof. A “portion” of a CTS region must include at least one (and may include more than one) entire domain (e.g., the entire transmembrane domain) but less than the entire CTS region.
Importantly, by using the term “CTS region” or “transmembrane domain” it is not intended that only wild type sequences be encompassed. Indeed, this invention is not limited to natural glycosyltransferases and enzymes involved in glycosylation, but also includes the use of synthetic enzymes exhibit the same or similar function. In one embodiment, wild type domains are changed (e.g. by deletion, insertion, replacement and the like).
Finally, by using the indicator “Tm” when referring to a particular hybrid (e.g., “TmXyl-), entire transmembrane/CTS domains (with or without changes to the wild-type sequence) as well as portions (with or without changes to the wild-type sequence) are intended to be encompassed.