Saponins isolated from Panax ginseng and the congener plants thereof (including Panax. notoginseng and Panax quinguefolium etc.) are collectively named as ginsenosides. Ginsenosides belong to triterpene saponins and they are the main active ingredient of Panax. At present, at least 60 kinds of ginsenosides have been isolated from Panax, some of which were proved to have broad physiological functions and pharmaceutical values including anti-cancer, immunoregulation, anti-fatigue, heart protection, hepatoprotection, etc.
Structurally, ginsenosides are small molecules with biological activity formed by the glycosylation of sapogenins. The types of ginsenoside sapogenins are limited, mainly including dammarane-type protopanaxadiol (PPD), protopanaxatriol (PPT), and oleanolic acid. Recently, two new sapogenins, 25-OH-PPD and 25-OCH3-PPD, were isolated from P. notoginseng. Both of these new sapogenins present excellent anti-tumor activities.
Upon glycosylation, the water solubility of sapogenins is enhanced and different biological activities are exhibited. The carbohydrate chain of PPD saponin usually binds to C3 and (or) C20 hydroxyl(s) of sapogenin(s). Compared with PPD saponin, PPT saponin has one more hydroxyl at position C6. The glycosylation bindings all occur at C6 (and) or C20 hydroxyl(s) of PPT saponin according to the present findings. Glycosylation binding at C-3 of PPT saponin was not yet reported. The glycosyl can be glucose, rhamnose, xylose or arabinose.
The physiological functions and pharmaceutical values of ginsenosides can dramatically vary with different glycosyl binding sites, and composition and length of carbohydrate chains. For example, ginsenoside Rb1, Rd and Rc are all saponins with PPD as their sapogenins; they only vary in glycosyl modification, but their physiological functions differ a lot. Rb1 possesses the function of stabilizing the central neural system; while the function of Rc is to inhibit the function of the central neural system. Rb1 presents broad physiological functions while the functions of Rd are quite limited.
Structural diversities of ginsenoside sapogenins and saponins are also embodied in their stereo structures. Despite many chiral carbon atoms on tetracyclic triterpenoids skeleton, C20 is the dominant site for forming stereo structures. C20 epimers exist in almost every kind of ginsenosides and sapogenins. The content of ginsenosides and sapogenins with S-configuration at C20 in ginseng is far above that of R-configuration. Thus, in most cases, ginsenosides and sapogenins generally refer to C20 S-configuration ginsenosides and sapogenins. However, physiological activities of C20 epimers of ginsenosides and sapogenins are distinctly different. For example, the S-type ginsenoside Rh2 (3-O-β-(D-glucopyranosyl)-20(S)-protopanaxadiol) can significantly inhibit prostate cancer cells, while the inhibiting effect of R-type ginsenosides Rh2 (3-O-β-(D-glucopyranosyl)-20(R)-protopanaxadiol) is quite poor. The R-type ginsenoside Rh2 can selectively inhibit the generation of osteoclasts without any cytotoxicity, while the S-type ginsenoside Rh2 poorly inhibits the osteoclasts generation with strong cytotoxicity to osteoclasts. Besides, the regulatory effects of the S-type and R-type ginsenoside Rh2s on P-glycoprotein are substantially different.
The function of glycosyltransferases is transferring glycosyl(s) from glycosyl donor(s) (nucleotide diphosphate sugar, such as, UDP-glucose) to different glycosyl receptor(s). At present, glycosyltransferases have been classified into 94 families based on different amino acid sequences. More than one hundred different glycosyltransferases were identified among the sequenced plant genomes for now. Glycosyl acceptors for these glycosyltransferases include saccharides, lipids, proteins, nucleic acids, antibiotics, and other small molecules. The function of glycosyltransferases involved in saponin glycosylation in ginseng is transferring glycosyls from glycosyl donors to hydroxyls at position C-3, C-6, or C-20 of sapogenins or aglycones, thereby forming saponins with various pharmaceutical values.
At present, upon analyzing the transcriptome of P. ginseng, P. quinguefolium and P. notoginseng, researchers have identified huge amounts of glycosyltransferase genes. However, which of them are involved in ginsenosides synthesis remained ambiguous. The studies on isolation and purification of glycosyltransferases are making slow progress due to the numerous kinds of glycosyltransferases in ginseng and the extremely low content thereof.
Rare ginsenosides refer to the saponins with extremely low content in P. ginseng. Ginsenoside CK (20-O-β-(D-glucopyranosyl)-20(S)-protopanaxadiol) belongs to PPD-type saponins with a glucosyl group attached to C-20 hydroxyl of sapogenins. The content of ginsenoside CK in P. ginseng is extremely low, and it is the main metabolite produced by microbiological hydrolysis of PDD-type saponins in human intestinal tract. Researches indicated that most PDD-type saponins can be absorbed by human body only upon being metabolized into CK. Thus, ginsenosides CK is the real entity which can be directly absorbed by human body and take effects, while other saponins are only prodrugs. Ginsenoside CK has excellent anti-tumor activity. It can induce tumor cell apoptosis and inhibit tumor cell metastasis. The assays using it with combination of radiotherapy or chemotherapy came out to possess the effect of radiotherapy or chemotherapy enhancement. Besides, ginsenoside CK has the activities of anti-allergy, anti-inflammation, neural protection, anti-diabetes, and anti-skin aging. The pharmacological activities of ginsenoside CK are characterized by its multiple-targets, high activity, and low toxicity.
Ginsenoside F1 (20-O-β-D-glucopyranosyl-20(S)-protopanaxatriol) belonging to PPT saponins also has a very low content in P. ginseng and is one of the rare ginsenosides as well. Ginsenoside F1 is quite similar to CK in structure, also having a glucosyl group attached to C-20 hydroxyl of sapogenin. Ginsenoside F1 also possesses unique pharmaceutical values. It has the function of anti-aging and anti-oxidization.
Ginsenoside Rh1 (6-O-β-D-glucopyranosyl-20(S)-protopanaxatriol) belonging to PPT saponins also has a very low content in P. ginseng and is one of the rare ginsenosides as well. Ginsenoside Rh1 is quite similar to F1 in structure, but its glycosylation site is the hydroxyl at the C-6 position. Ginsenoside Rh1 also possesses unique physiological functions, such as anti-allergy and anti-inflammation.
Ginsenoside Rh2 (3-O-β-(D-glucopyranosyl)-20(S)-protopanaxadiol) with an extremely low content in P. ginseng of about 0.01% of ginseng dry weight is one of the rare ginsenosides as well. However, ginsenoside Rh2 has an excellent anti-tumor activity, which enabling it to be one of the most primary anti-tumor active ingredients in ginseng. It can inhibit tumor cell growth, induce tumor cell apoptosis, and inhibit tumor cell metastasis. Researches showed that ginsenoside Rh2 can inhibit the proliferation of lung cancer cells 3LL (mice), Morris liver cancer cells (rats), B-16 melanoma cells (mice), and HeLa cells (human). Clinically, treatments by combing ginsenoside Rh2 with radiotherapy or chemotherapy can improve the effects of theses therapies. Moreover, ginsenoside Rh2 also has the function of anti-allergy, improving body immunity, and inhibiting the inflammation produced by NO and PEG.
Ginsenoside Rg3 with a low content in ginseng has a significant anti-tumor effect, and it is complementary to ginsenoside Rh2 in anti-tumor effect. Clinic uses demonstrated that the combination of Rg3 and Rh2 can further enhance their synergetic effect on tumor treatment.
Because of the extremely low content of rare ginsenosides CK, F1, Rh1, Rh2 and Rg3 in P. ginseng, the present preparation method is, starting from the large amounts of saponins in P. ginseng, extracting and purifying upon conversion by selectively hydrolyzing glycosyls. Total saponins or protopanaxadiol type saponins of panax plants are used as raw materials for converting, isolating, and extracting 20(S)-protoginsenoside-Rh2. This preparation method is advantaged in that the huge amounts of diol type saponins are utilized. However, the reaction must be conducted under high temperature and high pressure (Changchun SONG etc. Preparation Method of 20(S)-ginsenosides-Rh2, Pharmaceutical Composition and Use Thereof, CN patent No. 1225366, 1999). Two methods of preparing 20(R&S)-ginsenosides-Rh2 from ginseng ingredients are disclosed by Korea Ginseng and Tobacco Institution; wherein the PPD saponin ingredients are obtained first, and then subjected to acidic hydrolysis to give 20(R&S)-ginsenosides-Rg3, the ginsenoside Rg3 is then treated to obtain ginsenoside Rh2. The major defect of the above methods is that they need a set of PPD-type saponin monomers as the starting materials for the products, which results in the complicated reaction steps, great loss of raw materials and complicated operations, thereby leading to the increased costs and difficulty in improving the yield. Since the glycosyls at C-20 of CK and F1 can be easily destroyed during the hydrolysis process, chemical methods are unsuitable for CK and F1 production. The yield of Rh1 by hydrolyzing saponins through acid or alkaline method is very low and many by-products are produced as well.
Enzymatic conversion method is characterized with its mild condition, high specificity, and easy isolation and purification of products, and hence it is the major method for CK, F1 and Rh1 production at present. The enzymes used for preparing ginsenosides CK, F1, Rh1 and Rh2 mainly include naringinase, pectinase, cellulase, lactase and the like. Ginsenoside CK can be also obtained by microbiological conversion which mainly utilizes anaerobion originated from intestinal tracts. Although great progresses have been made for preparing rare ginsenosides CK, F1, Rh1 and Rh2 by biological conversion (enzymatic method and microbiological method), the cost for preparing CK1, F1, Rh1 and Rh2 is still high and the yield is quite limited due to the fact that these methods use ginsenosides as the raw material (CN patent: CN1105781C; Dongshi J I N, Journal of Dalian Light Industry Academy, 2001).
In view of the important biological activities and tremendous economic values of ginsenoside Rh2, continuous efforts have been made for decades to produce such ginsenoside through chemical synthesis, the basic principle of which is the condensation reaction of PPD and the corresponding glycosyls, namely semi-synthesis (JP patent: JP8-208688, 1996). This method uses PPD as the raw material for semi-synthesizing 20(S)-protoginsenoside-Rh2. Its synthesis comprises six steps, and equivalent silver carbonate is used as catalyst in the glycosylation reaction. The high price of the catalyst results in a high cost, and at the same time, the poor stereoselectivity of the catalyst results in a low yield of product. In an alternative method, PPD with its C-12 hydroxyl substituted by aromatic acyl or alkyl is used and glucosyl group donor with activated C1 hydroxyl is added under the protection of organic solvents and inert gas for condensation reaction catalyzed by Lewis acid with the presence of molecular sieve. The resultant product is subjected to column chromatography or recrystallization purification and then the protecting groups are removed, thereby obtaining 20(S)-ginsenosides-Rh2 (Yongzheng H U I, A Method for Preparing 20(S)-ginsenosides-Rh2, CN patent: CN 1587273A, 2005).
At present, there is no method to effectively prepare rare ginsenosides CK, F1, Rh1, Rh2 and Rg3 in this field. Therefore, there is an urgent need to develop various glycosyltransferases with high specificity and efficiency.
Content of the Invention
The object of the present invention is to provide a group of glycosyltransferases and use thereof.
The first aspect of the present invention is to provide a method for in vitro glycosylation, comprising the steps of:
in the presence of a glycosyltransferase, transferring a glycosyl from a glycosyl donor to the following site on tetracyclic triterpenoid compounds:
positions C-20, C-6, C-3 or the first glycosyl at position C-3;
thereby forming glycosylated tetracyclic triterpenoid compounds;
wherein, said glycosyltransferase is selected from the group consisting of:
a glycosyltransferase as set forth by SEQ ID NOs.: 2, 16, 18, 20, 22, 24, 26, 28, 43, 55, 57, 59 or 61.
The second aspect of the present invention is to provide an isolated polypeptide; said polypeptide is selected from the group consisting of:
(a) a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 2, 16, 18, 20, 26, 28, 43, 55, 57, 59 or 61;
(b) a derivative polypeptide, which is derived from a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 2, 16, 18, 20, 26, 28, 43, 55, 57, 59 or 61 by substitution, deletion, or addition of one or more amino acid residues, or by addition of a signal peptide sequence, and has the activity of glycosyltransferase;
(c) a derivative polypeptide, which has the polypeptide sequence of (a) or (b) in its sequence;
(d) a derivative polypeptide, which has ≥85% or ≥90% (preferably ≥95%) sequence homology with the amino acid sequence as set forth by any one of SEQ ID NOs: 2, 16, 18, 20, 26, 28, 43, 55, 57, 59 or 61 and has the activity of glycosyltransferase.
In another preferred embodiment, said sequence (c) is a fusion protein derived from (a) or (b) by addition of a tag sequence, signal sequence, or secretory signal sequence.
In another preferred embodiment, said polypeptide is set forth by SEQ ID NOs: 2, 16, 18, 20, 26, 28, 3, 55, 57, 59 or 61.
The third aspect of the present invention is to provide an isolated polypeptide; said polypeptide is selected from the group consisting of:
(a1) a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 22, 24 and 41;
(b1) a polypeptide having the polypeptide sequence of (a1) in its sequence; and/or
said polypeptide is selected from the group consisting of:
(a2) a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 4 and 6;
(b2) a derivative polypeptide, which is derived from a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 4 and 6 by substitution, deletion, or addition of one or more amino acid residues, or by addition of a signal peptide sequence, and has the activity of glycosyltransferase;
(c2) a derivative polypeptide, which has the polypeptide sequence of (b2) in its sequence;
(d2) a derivative polypeptide, which has ≥85% or ≥90% (preferably ≥95%) sequence homology with the amino acid sequence as set forth by any one of SEQ ID NOs: 4 and 6 and has the activity of glycosyltransferase.
In another preferred embodiment, sequence (c2) is a fusion protein derived from (a2) or (b2) by addition of a tag sequence, signal sequence, or secretory signal sequence.
The fourth aspect of the present invention is to provide an isolated polynucleotide; said polynucleotide is selected from the group consisting of:
(A) a nucleotide sequence encoding the polypeptide of the first or the second aspect;
(B) a nucleotide sequence encoding the polypeptide as set forth by SEQ ID NOs.: 2, 4, 6, 16, 18, 20, 22, 24, 26, 28, 41, 43, 55, 57, 59 or 61;
(C) a nucleotide sequence as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60;
(D) a nucleotide sequence, which has ≥95% (preferably ≥98%) homology with the sequence as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60;
(E) a nucleotide sequence derived from the nucleotide sequence as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60 by deletion or addition of 1-60 (preferably 1-30, more preferably 1-10) nucleotides at its 5′ end and/or 3′ end;
(F) a nucleotide sequence complementary to (preferably completely complementary to) any one of the nucleotide sequence of (A)-(E).
In another preferred embodiment, said nucleotide sequence is as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60.
In another preferred embodiment, the polynucleotide with a sequence as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60 encodes the polypeptide with an amino acid sequence as set forth by SEQ ID NOs.: 2, 4, 6, 16, 18, 20, 22, 24, 26, 28, 41, 43, 55, 57, 59 or 61, respectively.
The fifth aspect of the present invention is to provide a vector; said vector contains the polynucleotide in the third aspect of the present invention. Preferably, said vector includes expression vector, shuttle vector, or integration vector.
The fifth aspect of the present invention is to provide use of said isolated polypeptide in the first or the second aspect for catalyzing one or more of the following reactions, or for preparing a catalyst preparation used in the catalyzation of one or more of the following reactions: transferring glycosyl(s) from glycosyl donor(s) to hydroxyl(s) at position(s) C-20 and/or C-6 and/or C-3 of tetracyclic triterpenoid compound(s) so as to substitute H in said hydroxyl, and transferring glycosyl(s) from glycosyl donor(s) to the first glycosyl at position C-3 of tetracyclic triterpenoid compound(s) so as to extend carbohydrate chain.
In another preferred embodiment, said glycosyl donor(s) includes a nucleoside diphosphate sugar selected from the group consisting of: UDP-glucose, ADP-glucose, TDP-glucose, CDP-glucose, GDP-glucose, UDP-acetyl glucose, ADP-acetyl glucose, TDP-acetyl glucose, CDP-acetyl glucose, GDP-acetyl glucose, UDP-xylose, ADP-xylose, TDP-xylose, CDP-xylose, GDP-xylose, UDP-galacturonic acid, ADP-galacturonic acid, TDP-galacturonic acid, CDP-galacturonic acid, GDP-galacturonic acid, UDP-galactose, ADP-galactose, TDP-galactose, CDP-galactose, GDP-galactose, UDP-arabinose, ADP-arabinose, TDP-arabinose, CDP-arabinose, GDP-arabinose, UDP-rhamnose, ADP-rhamnose, TDP-rhamnose, CDP-rhamnose, GDP-rhamnose, or other nucleoside diphosphate hexose or nucleoside diphosphate pentose, or the combination thereof.
In another preferred embodiment, said glycosyl donor(s) includes uridine diphosphate (UDP) sugars selected from the group consisting of: UDP-glucose, UDP-galacturonic acid, UDP-galactose, UDP-arabinose, UDP-rhamnose, or other uridine diphosphate hexose or uridine diphosphate pentose, or the combination thereof.
In another preferred embodiment, said isolated polypeptide is used for catalyzing one or more of the following reactions or for preparing a catalyst preparation used in the catalyzation of one or more of the following reactions:
(A)

wherein, R1 is H, monosaccharide glycosyl or polysaccharides glycosyl; R2 or R3 is H or OH; R4 is glycosyl; said polypeptide is selected from SEQ ID NOs.: 2, 16 or 18 or a derivative polypeptide thereof.
In another preferred embodiment, said monosaccharide includes glucose (Glc), rhamnose (Rha), acetyl glucose (Glc (6) Ac), arabinofuranose (Araf), arabopyranose (Arap), and xylose (Xyl), etc.
In another preferred embodiment, said polysaccharide includes polysaccharides composed of 2-4 monosaccharides, such as Glc(2-1)Glc, Glc(6-1)Glc, Glc(6)Ac, Glc(2-1)Rha, Glc(6-1)Arap, Glc(6-1)Xyl, Glc(6-1)Araf, Glc(3-1)Glc(3-1), Glc(2-1) Glu(6)Ac, Glc(6-1)Arap(4-1)Xyl, Glc(6-1)Arap(2-1)Xyl, or Glc(6-1)Arap(3-1)Xyl, etc.
Compounds with R1-R4 substituted are shown in the following table:
substrate R1R2R3R4productPPDHHOHglycosylCKRh21 glycosyl HOHglycosylF2Rg32 glycosylsHOHglycosylRdPPTHOHOHglycosylF1DMHHHglycosyl20-G-DM
That is, when both of said R1 and R2 are H, and R3 is OH, said compound of formula (I) is protopanaxadiol (PPD);
when R1 is a glucosyl, R2 is H, and R3 is OH, said compound of formula (I) is ginsenoside Rh2;
when R1 is two glucosyls, R2 is H, and R3 is OH, said compound of formula (I) is ginsenoside RG3;
when R1 is H, R2 is OH, and R3 is OH, said compound of formula (I) is protopanaxatriol (PPT);
when R1 is H, R2 is H, and R3 is H, said compound of formula (I) is dammarenediol II (DM).
(B)

wherein, R1 is H or a glycosyl, R2 is a glycosyl, R3 is a glycosyl, said polypeptide is selected from SEQ ID NOs.: 2, 16, 18, or 20 or a derivative polypeptide thereof;
or, R1 is H or a glycosyl; R2 is H; R3 is a glycosyl, said polypeptide is selected from SEQ ID NO.: 20 or a derivative polypeptide thereof.
Compounds with R1-R3 substituted are shown in the following table:
substrateR1R2R3productF1HglycosylglycosylRg1PPTHHglycosylRh1
When both of said R1 and R2 are H, said compound of formula (III) is protopanaxatriol (PPT).
When R1 is H, R2 is a glucosyl, said compound of formula (III) is ginsenoside F1.
(C)

wherein, R1 is H or OH; R2 is H or OH; R3 is H or a glycosyl; R4 is a glycosyl, said polypeptide is selected from SEQ ID NOs.: 22, 24, 41 or 43 or a derivative polypeptide thereof.
Compounds with R1-R4 substituted are shown in the following table:
substrate R1R2R3R4productPPDHOHHglycosyl Rh2CKHOHglycosylglycosylF2PPTOHOHHglycosyl3-G-PPTF1OHOHglycosylglycosyl3-G-F1DMHHHglycosyl3-G-DM
When both of R1 and R3 are H, R2 is OH, said compound of formula (V) is PPD;
R1 is H, R2 is OH, R3 is a glucosyl, said compound of formula (V) is ginsenoside CK;
R1 is OH, R2 is OH, R3 is H, said compound of formula (V) is PPT;
R1 is OH, R2 is OH, R3 is a glucosyl, said compound of formula (V) is ginsenoside F1;
R1 is H, R2 is OH, R3 is H, said compound of formula (V) is dammarenediol II (DM).
When the substrate is PPD, said polypeptide is selected from SEQ ID NOs.: 22, 24, 41 or 43 or a derivative polypeptide thereof; when the substrate is CK, said polypeptide is selected from SEQ ID NOs.: 22, 24 or 43 or a derivative polypeptide thereof; when the substrate is PPT, said polypeptide is selected from SEQ ID NOs.: 22, 24 or 41 or a derivative polypeptide thereof; when the substrate is F1 and DM, said polypeptide is selected from SEQ ID NOs.: 22 or 24 or a derivative polypeptide thereof.
(D)

wherein, R1 is OH or OCH3; R2 is glycosyl, said polypeptide is selected from SEQ ID NOs.: 22, 24, 41 or 43 or a derivative polypeptide thereof.
Compounds with R1-R2 substituted are shown in the following table:
substrateR1R2product25-OH-PPDOHglycosyl 3-G-25-OH-PPD25-OCH3-PPD OCH3glycosyl3-G-25-OCH3-PPD
When R1 is OH, said compound of formula (VII) is 25-OH-PPD;
R1 is OCH, said compound of formula (VII) is 25-OCH3-PPD.
(E)

wherein, R1 is glycosyl; R2 or R3 is OH or H; R4 is glycosyl or H; R5 is glycosyl, R5-R1-0 is a glycosyl derived from the first glycosyl at C-3, said polypeptide is selected from SEQ ID NOs.: 26, 28, 55, 57, 59 or 61 or a derivative polypeptide thereof.
Compounds with R1-R4 substituted are shown in the following table:
substrateR1R2R3R4productRh2glycosylHOHHRg3F2glycosyl HOHglycosylRd
When R1 is a glucosyl; R2 is H, R3 is OH, R4 is H, compound of formula (IX) is Rh2.
When R1 is a glucosyl; R2 is H, R3 is OH, R4 is a glucosyl, compound of formula (IX) is F2.
(F)

said polypeptide is selected from SEQ ID NO: 22 or SEQ ID NO: 24 or a derivative polypeptide thereof. The compound of formula (XI) is lanosterol, and the compound of formula (XII) is 3-O-β-(D-glucopyranosyl)-lanosterol.
In another preferred embodiment, said glycosyl is selected from glucosyl, galacturonic acid radical, galactosyl, arabinosyl, rhamnosyl, and other hexosyls or pentosyls.
In another preferred embodiment, said compounds of formulas (I), (III), (V), (VII), (IX) or (XI) include but are not limited to S- or R-dammarane-type tetracyclic triterpene compounds, lanostane-type typetetracyclic triterpene compounds, tirucallane-type typetetracyclic triterpene compounds, cycloartane-type typetetracyclic triterpene compounds, cucurbitane-type typetetracyclic triterpene compounds, or meliacane-type typetetracyclic triterpene compounds.
In another preferred embodiment, said polypeptide is selected from the group consisting of:
(a) a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 2, 16, 18, 20, 26, 28, 41, 43, 55, 57, 59 or 61;
(b) a derivative polypeptide, which is derived from a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 2, 16, 18, 20, 26, 28, 41, 43, 55, 57, 59 or 61 by substitution, deletion, or addition of one or more amino acid residues, or by addition of a signal peptide sequence, and has the activity of glycosyltransferase;
(c) a derivative polypeptide, which has the polypeptide sequence of (a) or (b) in its sequence;
(d) a derivative polypeptide, which has ≥85% or ≥90% (preferably ≥95%) sequence homology with the amino acid sequence as set forth by any one of SEQ ID NOs: 2, 16, 18, 20, 26, 28, 41, 43, 55, 57, 59 or 61 and has the activity of glycosyltransferase.
In another preferred embodiment, said polypeptide is selected from the group consisting of:
(a1) a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 22 and 24;
(b1) a polypeptide having the polypeptide sequence of (a1) in its sequence; and/or
said polypeptide is selected from the group consisting of:
(a2) a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 4 and 6;
(b2) a derivative polypeptide, which is derived from a polypeptide having the amino acid sequence as set forth by any one of SEQ ID NOs.: 4 and 6 by substitution, deletion, or addition of one or more amino acid residues, or by addition of a signal peptide sequence, and has the activity of glycosyltransferase;
(c2) a derivative polypeptide, which has the polypeptide sequence of (b2) in its sequence;
(d2) a derivative polypeptide, which has ≥85% or ≥90% (preferably ≥95%) sequence homology with the amino acid sequence as set forth by any one of SEQ ID NOs: 4 and 6 and has the activity of glycosyltransferase.
In another embodiment, the polynucleotide encoding said polypeptide is selected from the group consisting of:
(A) a nucleotide sequence encoding the polypeptide of the first or the second aspect;
(B) a nucleotide sequence encoding the polypeptide as set forth by SEQ ID NOs.: 2, 4, 6, 16, 18, 20, 22, 24, 26, 28, 41, 43, 55, 57, 59 or 61;
(C) a nucleotide sequence as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60;
(D) a nucleotide sequence, which has ≥95% (preferably ≥98%) homology with the sequence as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 27, 40, 42, 54, 56, 58 or 60;
(E) a nucleotide sequence derived from the nucleotide sequence as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60 by deletion or addition of 1-60 (preferably 1-30, more preferably 1-10) nucleotides at its 5′ end and/or 3′ end;
(F) a nucleotide sequence complementary to (preferably completely complementary to) any one of the nucleotide sequence of (A)-(E).
In another preferred embodiment, said nucleotide sequence is as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60.
In another preferred embodiment, the polynucleotide with a sequence as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60 encodes the polypeptide with an amino acid sequence as set forth by SEQ ID NOs.: 2, 4, 6, 16, 18, 20, 22, 24, 26, 28, 41, 43, 55, 57, 59 or 61, respectively.
The sixth aspect of the present invention is to provide a method for conducting catalytic glycosylation, comprising the following steps: in the presence of a polypeptide and a derivative polypeptide according to the second and third aspects of the present invention, conducting the catalytic glycosylation.
In another preferred embodiment, said method further comprises the step of:
In the presence of a glycosyl donor and a polypeptide or a derivative polypeptide according to the second or third aspect of the present invention, transforming said compound of formula (I) into said compound of formula (II), or transforming said compound of formula (III) into said compound of formula (IV), or transforming said compound of formula (V) into said compound of formula (VI), or transforming said compound of formula (VII) into said compound of formula (VIII), or transforming said formula (IX) compound into said compound of formula (X), or transforming said compound of formula (XI) into said compound of formula (XII);
In another preferred embodiment, said method further comprises: adding said polypeptide or a derivative polypeptide thereof into the catalytic reaction, respectively; and/or
adding said polypeptide or a derivative polypeptide thereof into the catalytic reaction simultaneously.
In another preferred embodiment, said method further comprises: in the co-presence of a glycosyl donor and at least two of the polypeptide or the derivative polypeptide according to the second and third aspects of the present invention, transforming the compound of formula (I) into the compound of formula (IV), (VI), (VIII), (X), or transforming the compound of formula (III) into the compound of formula (II), (VI), (VIII), (X), or transforming the compound of formula (V) into the compound of formula (II), (IV), (VIII), (X), or transforming the compound of formula (VII) into the compound of formula (II), (IV), (VI), (X), or transforming the compound of formula (IX) into the compound of formula (II), (IV), (VI), (VIII).
In another preferred embodiment, said method further comprises: co-expressing the nucleotide sequence encoding the glycosyltransferase and the key gene(s) in the anabolism pathway of dammarenediol II and/or protopanaxadiol and/or protopanaxatriol in a host cell, thereby obtaining said compound of formula (II), (IV), (VI), (VIII), (X) or (XII).
In another preferred embodiment, said host cell is saccharomycetes or E. coli. 
In another preferred embodiment, said polypeptide is a polypeptide having the amino acid sequence as set forth by SEQ ID NOs.: 2, 4, 6, 16, 18, 20, 22, 24, 26, 28, 41, 43, 55, 57, 59 or 61 and a derivative polypeptide thereof.
In another preferred embodiment, the nucleotide sequence encoding said polypeptide is as set forth by SEQ ID NOs.: 1, 3, 5, 15, 17, 19, 21, 23, 25, 27, 40, 42, 54, 56, 58 or 60.
In another preferred embodiment, said method further comprises: providing additive(s) for modulating enzyme activity to the reaction system.
In another preferred embodiment, said additive(s) for modulating enzyme activity is: additive(s) enhancing enzyme activity or inhibiting enzyme activity.
In another preferred embodiment, said additive(s) for modulating enzyme activity is selected from the group consisting of Ca2+, Co2+, Mn2+, Ba2+, Al3+, Ni2+, Zn2+, and Fe2+.
In another preferred embodiment, said additive(s) for modulating enzyme activity is a material(s) capable of producing Ca2+, Co2+, Mn2+, Ba2+, Al3+, Ni2+, Zn2+, or Fe2+.
In another preferred embodiment, said glycosyl donor(s) is nucleoside diphosphate sugar(s) selected from the group consisting of: UDP-glucose, ADP-glucose, TDP-glucose, CDP-glucose, GDP-glucose, UDP-acetyl glucose, ADP-acetyl glucose, TDP-acetyl glucose, CDP-acetyl glucose, GDP-acetyl glucose, UDP-xylose, ADP-xylose, TDP-xylose, CDP-xylose, GDP-xylose, UDP-galacturonic acid, ADP-galacturonic acid, TDP-galacturonic acid, CDP-galacturonic acid, GDP-galacturonic acid, UDP-galactose, ADP-galactose, TDP-galactose, CDP-galactose, GDP-galactose, UDP-arabinose, ADP-arabinose, TDP-arabinose, CDP-arabinose, GDP-arabinose, UDP-rhamnose, ADP-rhamnose, TDP-rhamnose, CDP-rhamnose, GDP-rhamnose, or other nucleoside diphosphate hexose or nucleoside diphosphate pentose, or the combination thereof.
In another preferred embodiment, said glycosyl donor(s) is uridine diphosphate (UDP) sugars selected from the group consisting of: UDP-glucose, UDP-galacturonic acid, UDP-galactose, UDP-arabinose, UDP-rhamnose, or other uridine diphosphate hexose or uridine diphosphate pentose, or the combination thereof.
In another preferred embodiment, the pH of the reaction system is: pH4.0-10.0, preferably 5.5-9.0.
In another preferred embodiment, the temperature of the reaction system is: 10° C.-105° C., preferably 20° C.-50° C.
In another preferred embodiment, the key gene(s) in the anabolism pathway of dammarenediol II includes but are not limited to dammarenediol synthase gene.
In another preferred embodiment, the key gene(s) in the anabolism pathway of PPD includes but is not limited to: dammarenediol synthase gene, cytochrome P450 CYP716A47 gene, and P450 CYP716A47 reductase gene, or the combination thereof.
In another preferred embodiment, the key gene(s) in the anabolism pathway of PPT includes but is not limited to: dammarenediol synthase gene, cytochrome P450 CYP716A47 gene, P450 CYP716A47 reductase gene, cytochrome P450 CYP716A53V2 gene and the reductase gene thereof, or the combination thereof.
In another preferred embodiment, the substrate of the catalytic glycosylation is the compound of formula (I), (III), (V), (VII), (IX) or (XI), and said product is the compound of (II), (IV), (VI), (VIII), (X) or (XII);
In another preferred embodiment, said compound of formula (I) is PPD (Protopanaxadiol), and the compound of formula (II) is ginsenoside CK (20-O-β-(D-glucopyranosyl)-protopanaxadiol);
or, said compound of formula (I) is ginsenoside Rh2 (3-O-β-(D-glucopyranosyl)-protopanaxadiol)), and the compound of formula (II) is ginsenoside F2 (3-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxadiol);
or, said compound of formula (I) is ginsenoside Rg3, and the compound of formula (II) is ginsenoside Rd;
or, said compound of formula (I) is PPT (Protopanaxatriol), and the compound of formula (II) is ginsenoside F1 (20-O-β-(D-glucopyranosyl)-protopanaxatriol);
or, said compound of formula (I) is DM (Dammarenediol II), and the compound of formula (II) is ginsenoside 20-O-β-(D-glucopyranosyl)-Dammarenediol II;
or, said compound of formula (III) is PPT, and the compound of formula (IV) is ginsenoside Rh1 (6-O-β-(D-glucopyranosyl)-protopanaxatriol);
or, said compound of formula (III) is ginsenoside F1, and the compound of formula (IV) is ginsenoside Rg1 (6-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxadiol);
or, said compound of formula (V) is PPD, and the compound of formula (VI) is ginsenoside Rh2 (3-O-β-(D-glucopyranosyl)-protopanaxadiol);
or, said compound of formula (V) is CK, and the compound of formula (VI) is ginsenoside F2 (3-O-β-(D-glucopyranosyl)-20-O-β-(D-glucopyranosyl)-protopanaxadiol);
or, said compound of formula (V) is PPT, and the compound of formula (VI) is ginsenoside 3-O-β-(D-glucopyranosyl)-protopanaxatriol;
or, said compound of formula (V) is ginsenoside F1, and the compound of formula (VI) is ginsenoside 3-O-β-(D-glucopyranosyl)-F1;
or, said compound of formula (V) is DM, and the compound of formula (VI) is ginsenoside 3-O-β-(D-glucopyranosyl)-Dammarenediol II;
or, said compound of formula (VII) is 25-OH-PPD (25-OH-protopanaxadiol), and the compound of formula (VIII) is ginsenoside 3-O-β-(D-glucopyranosyl)-25-OH-protopanaxadiol;
or, said compound of formula (VII) is 25-OCH3-PPD (25-OCH3-protopanaxadiol), and the compound of formula (VIII) is ginsenoside 3-O-β-(D-glucopyranosyl)-25-OCH3-protopanaxadiol; or, said compound of formula (IX) is ginsenoside Rh2, and the compound of formula (X) is ginsenoside Rg3;
or, said compound of formula (IX) is ginsenoside F2, and the compound of formula (X) is ginsenoside Rd.
Or, said compound of formula (XI) is lanosterol, and the compound of formula (XII) is 3-O-β-(D-glucopyranosyl)-lanosterol.
The seventh aspect of the present invention is to provide a genetically engineered host cell; said host cell contains the vector according to the fifth aspect of the present invention, or has a polynucleotide according to the fourth aspect of the present invention integrated in its genome.
In another preferred embodiment, said glycosyltransferase is the polypeptide or the derivative polypeptide according to the second or third aspect of the present invention.
In another preferred embodiment, the nucleotide sequence encoding said glycosyltransferase is as described in the fourth aspect of the present invention.
In another preferred embodiment, said cell is a prokaryocyte or a eukaryocyte.
In another preferred embodiment, said host cell is a eukaryocyte, such as a yeast cell or a plant cell.
In another preferred embodiment, said host cell is a Saccharomyces cerevisiae cell.
In another preferred embodiment, said host cell is a prokaryocyte, such as E. coli. 
In another preferred embodiment, said host cell is a ginseng cell.
In another preferred embodiment, said host cell is not a cell naturally producing the compound of formula (II), (IV), (VI), (VIII), (X) or (XII).
In another preferred embodiment, said host cell is not a cell naturally producing rare ginsenoside CK and/or rare ginsenoside F1 and/or rare ginsenoside Rh2 and/or Rg3 and/or Rh1, and/or novel ginsenoside 20-O-β-(D-glucopyranosyl)-dammarendiol II, 3-O-β-(D-glucopyranosyl)-PPT, 3-O-β-(D-glucopyranosyl)-F1, 3-O-β-(D-glucopyranosyl)-DM, 3-O-β-D-glucopyranosyl)-25-OH-PPD, 3-O-β-(D-glucopyranosyl)-25-OCH3-PPD, and/or Rh1, F2, Rd and Rg1 etc.
In another preferred embodiment, said key gene(s) in the anabolism pathway of dammarenediol II includes but is not limited to: dammarenediol synthase gene.
In another preferred embodiment, the key gene(s) in the anabolism pathway of PPD contained in said host cell includes but is not limited to dammarenediol synthase gene, cytochrome P450 CYP716A47 gene, and P450 CYP716A47 reductase gene, or the combination thereof.
In another preferred embodiment, the key gene(s) in the anabolism pathway of PPT contained in said host cell includes but is not limited to dammarenediol synthase gene, cytochrome P450 CYP716A47 gene, P450 CYP716A47 reductase gene, and cytochrome P450 CYP716A53V2 gene, or the combination thereof.
The eighth aspect of the present invention is to provide use of the host cell according to the seventh aspect, for preparing an enzymatic catalyzation preparation, or for producing a glycosyltransferase, or as a catalytic cell, or for producing the compound of formula (II), (IV), (VI), (VIII), (X) or (XII).
In another preferred embodiment, said host cell is used for producing new saponins 20-O-β-(D-glucopyranosyl)-dammarendiol II and/or 3-O-β-(D-glucopyranosyl)-dammarendiol II, 3-O-β-(D-glucopyranosyl)-protopanaxatriol, 3-O-β-(D-glucopyranosyl)-F1 and/or rare ginsenoside CK and/or rare ginsenoside F1 and or rare ginsenoside Rh1 and/or ginsenoside Rh2 and/or rare ginsenoside Rg3 through glycosylation of dammarenediol II (DM) and/or protopanaxadiol (PPD), and/or protopanaxatriol (PPT).
The ninth aspect of the present invention is to provide a method for producing a transgenic plant, comprising the following step: regenerating said genetically engineered host cell according to the seventh aspect of the present invention into a plant, and said genetically engineered host cell is a plant cell.
In another preferred embodiment, said genetically engineered host cell is a ginseng cell.
It should be understood that in the present invention, the technical features specifically described above and below (such as in the Examples) can be combined with each other, thereby constituting a new or preferred technical solution which needs not be described one by one.