The present invention relates to glycosyl hydrolase genes for the biotechnological production of oligosaccharides, especially sulfated oligo-carrageenans and more particularly oligo-iota-carrageenans and oligo-kappa-carrageenans, by the biodegradation of carrageenans.
The sulfated galactans of Rhodophyceae, such as agars and carrageenans, represent the major polysaccharides of Rhodophyceae and are very widely used as gelling agents or thickeners in various branches of activity, especially agri-foodstuffs. About 6000 tonnes of agars and 22,000 tonnes of carrageenans are extracted annually from red seaweeds for this purpose. Agars are commercially produced by red seaweeds of the genera Gelidium and Gracilaria. Carrageenans, on the other hand, are widely extracted from the genera Chondrus, Gigartina and Eucheuma. 
Carrageenans consist of repeat D-galactose units alternately bonded by xcex21xe2x86x924 and xcex11xe2x86x923 linkages. Depending on the number and position of sulfate ester groups on the repeat disaccharide of the molecule, carrageenans are thus divided into several different types, namely: kappa-carrageenans, which possess one sulfate ester group, iota-carrageenans, which possess two sulfate ester groups, and lambda-carrageenans, which possess three sulfate ester groups.
The physicochemical properties and the uses of these polysaccharides as gelling agents are based on their capacity to undergo ball-helix conformational transitions as a function of the thermal and ionic environment [Kloareg et al., Oceanography and Marine Biologyxe2x80x94An annual review 26: 259-315 (1988)].
Furthermore, carrageenans are structural analogs of the sulfated polysaccharides of the animal extracellular matrix (heparin, chondroitin, keratan, dermatan) and they exhibit biological activities which are related to certain functions of these glycosaminoglycans.
In particular, carrageenans are known:
(i)xe2x80x94for their action on the immune system, causing the secretion of interleukin or prostaglandins,
(ii)xe2x80x94for their antiviral action on the AIDS virus HIV1, the herpes virus HSV1 and the hepatitis A virus,
(iii)xe2x80x94as antagonists of the fixation of the growth factors of human cells,
(iv)xe2x80x94and also for their action on the proliferation of keratinocytes and their action on the contractility of fibroblasts.
Furthermore, oligocarrageenans act on the adherence, the division and the protein synthesis of human cell cultures, doubtless as structural analogs of the glycosylated part of the proteins of the extracellular matrix. In plants, oligocarrageenans very significantly elicit enzymatic activities which are markers of growth (amylase) or of the phenolic defense metabolism (laminarinase, phenyl-alanineammonium lyase).
Carrageenans are extracted from red seaweeds by conventional processes such as hot aqueous extraction, and oligocarrageenans are obtained from carrageenans by chemical hydrolysis or, preferably, by enzymatic hydrolysis.
The production of oligocarrageenans by enzymatic hydrolysis generally comprises the following steps:
1) production of a glycosyl hydrolase by the culture of a marine bacterium;
2) enzymatic hydrolysis of the carrageenan with the glycosyl hydrolase thus obtained; and
3) fractionation and purification of the oligocarrageenans obtained.
Microorganisms which produce enzymes capable of hydrolyzing iota- and kappa-carrageenans were isolated by Bellion et al. in 1982 [Can. J. Microbiol. 28: 874-80 (1982)]. Some are specific for xcexa- or "igr"-carrageenan and others are capable of hydrolyzing both substrates. Another group of bacteria capable of degrading carrageenans was characterized by Sarwar et al. in 1983 [J. Gen. Appl. Microbiol. 29: 145-55 (1983)]. These yellow-orange bacteria are assigned to the Cytophaga group of bacteria and some of these bacteria have the property of hydrolyzing both agar and carrageenans.
Purification and characterisation of several "igr"-carrageenases and xcexa-carrageenases, such as the "igr"-carrageenase and xcexa-carrageenase of Cytophaga drobachiensis, the "igr"-carrageenase of Alteromonas fortis and the xcexa-carrageenase of Alteromonas carrageenovora, were described in the thesis of P. Potin [xe2x80x9cRecherche, production, purification et caractxc3xa9risation de galactane-hydrolases pour la prxc3xa9paration des parois d""algues rougesxe2x80x9d, (February 1992)]. A detailed study of the xcexa-carrageenase of Alteromonas carrageenovora was described by Potin et al. [Eur. J. Biochem. 228, 971-975 (1995)].
The availability of specific enzymes and tools for obtaining oligocarrageenans by genetic engineering could markedly improve their production.
The Applicant has now found novel glycosyl hydrolase genes which make it possible specifically to obtain either oligo-iota-carrageenans or oligo-kappa-carrageenans.
Thus the present invention relates to novel genes which code for glycosyl hydrolases having an HCA score with the iota-carrageenase of Alteromonas fortis which is greater than or equal to 65%, preferably greater than or equal to 70% and advantageously greater than or equal to 75% over the domain extending between amino acids 164 and 311 of the sequence [SEQ ID No. 2] of the iota-carrageenase of Alteromonas fortis. 
The present invention relates more particularly to the nucleic acid sequence [SED ID No. 1] which codes for an iota-carrageenase as defined above, the amino acid sequence of which is the sequence [SEQ ID No. 2].
The present invention further relates to the genes which code for glycosyl hydrolases having an HCA score with the kappa-carrageenase of Alteromonas carrageenovora which is greater than or equal to 75%, preferably greater than 80% and advantageously greater than 85% over the domain extending between amino acids 117 and 262 of the sequence [SEQ ID No. 6] of the kappa-carrageenase of Alteromonas carrageenovora. 
In particular, the invention relates to the nucleic acid sequence [SEQ ID No. 7] which codes for a kappa-carrageenase having a score as defined above, the amino acid sequence of which is the sequence [SEQ ID No. 8].
The glycosyl hydrolase genes of the invention are obtained by a process which consists in selecting proteins having an HCA score with the iota-carrageenase of Alteromonas fortis which is greater than or equal to 65%, preferably greater than or equal to 70% and advantageously greater than or equal to 75% over the domain extending between amino acids 164 and 311 of the sequence [SEQ ID No. 2] of the iota-carrageenase of Alteromonas fortis, and in sequencing the resulting genes by the conventional techniques well known to those skilled in the art.
The glycosyl hydrolase genes of the invention can also be obtained by a process which consists in selecting proteins having an HCA score with the kappa-carrageenase of Alteromonas carrageenovora which is greater than or equal to 75%, preferably greater than 80% and advantageously greater than 85% over the domain extending between amino acids 117 and 262 of the sequence [SEQ ID No. 6] of the kappa-carrageenase of Alteromonas carrageenovora, and in sequencing the resulting genes by the conventional techniques well known to those skilled in the art.
Finally, the present invention relates to the use of the above glycosyl hydrolase genes for obtaining, by genetic engineering, glycosyl hydrolases which are useful for the biotechnological production of oligocarrageenans.
The glycosyl hydrolases according to the invention are therefore characterized by the HCA score which they possess with a particular domain of the amino acid sequence of the iota-carrageenase of Alteromonas fortis or the kappa-carrageenase of Alteromonas carrageenovora. 
The HCA or xe2x80x9cHydrophobic Cluster Analysisxe2x80x9d method is a method of analyzing the sequences of proteins represented as a two-dimensional structure, which has been described by Gaboriaud et al. [FEBS Letters 224, 149-155 (1987)].
It is known that the three-dimensional structure of a protein governs its biological properties, the production of an active protein demanding correct folding.
It is also known that the primary structure of proteins varies much more substantially than the higher-order structures and that proteins can be grouped into families which show similar secondary and tertiary structures but sometimes have such divergent primary sequences that the mutual relationship between such proteins is not obvious. The code which relates primary structure and secondary structure therefore appears to be highly degenerate since very different primary structures can ultimately lead to similar secondary and tertiary structures [Structure 3, 853-859 (1995) and Proc. Natl. Acad. Sci. USA 92 (1995)].
The use of the HCA method has shown that the distribution, size and shape of these hydrophobic clusters along the amino acid sequences are representative of the 3D folding of the proteins studied.
Also, Woodcock et al. [Protein Eng. 5, 629-635 (1992)] have shown that the hydrophobic clusters defined by the xcex1-helical 2D diagram are statistically centered on the regular secondary structures (xcex1-helices, xcex2-strands), that the 2D diagram based on the ax-helix carries the greatest amount of structural information and that the correspondence between hydrophobic clusters and elements of secondary structure is of the same quality for any type of folding (all xcex1, all xcex2, xcex1/xcex2 and xcex1+xcex2), thus demonstrating that the HCA method can be used irrespective of the type of protein.
L. Lemesle-Varloot et al. [Biochimie 72, 555-574 (1990)] have shown that when two proteins have a similar distribution of hydrophobic clusters over a domain of at least 50 residues, their three-dimensional structures in this domain are considered to be superimposable and their functions to be analogous.
Thus, for example, Barbeyron et al. [Gene 139, 105-109 (1994)] used this HCA method for the comparison of the similarities in the shape, distribution and size of several hydrophobic clusters of the xcexa-carrageenase of Alteromonas carrageenovora with respect to enzymes from family 16 of glycosyl hydrolases.
The two-dimensional representation used in the HCA method is an xcex1-helix in which the amino acids are arranged by computer processing to give 3.6 residues per turn. To obtain an easily readable plane image, the helix is cut in the longitudinal direction. Finally, to obtain the whole of the hydrophobic clusters situated at the edges of the image, the diagram is duplicated. The method uses a code which recognizes only two states: the hydrophobic state and the hydrophilic state.
The amino acids recognized as being hydrophobic are identified and grouped into characteristic geometric figures. Using these two states makes it possible to become independent of the tolerance shown by the two- and three-dimensional structures towards the variability of the primary sequences. Furthermore, this representation affords rapid observation of interactions over a short or medium distance since the first amino acid and the second, adjacent amino acid of a given residue are located on a segment of 17 amino acids. Finally, in contrast to the analytical methods based on the primary or secondary structures of proteins, no xe2x80x9cwindowxe2x80x9d of predefined length is used.
The fundamental characteristic of the xcex1-helix representation is that, for a given globular protein or only a domain of this protein, the distribution of the hydrophobic residues on the diagram is not random. The hydrophobic residues (VILFWMY) form clusters of varying geometry and size. On the diagram, the hydrophilic and hydrophobic faces of the amphiphilic helices are very recognizable. Thus a horizontal diamond cluster corresponds to the hydrophobic face of an xcex1-helix, the internal helices appear as large horizontal hydrophobic clusters and the xcex2-strands appear as rather short, vertical hydrophobic clusters. The method makes it possible to identify the hydrophobic residues forming the core of the globular proteins and to locate the elements of secondary structure, namely the xcex1-helices and the xcex2-strands, independently of any knowledge of the secondary structure of the protein studied.
The HCA score between two proteins is calculated as follows:
For each cluster:
HCA score=2CR/(RC1+RC2)xc3x97100%
where
RC1 and RC2 are the number of hydrophobic residues in the cluster of protein 1 (cluster 1) and the cluster of protein 2 (cluster 2), respectively.
CR is the number of hydrophobic residues in the cluster 1 which correspond to the hydrophobic residues in the cluster 2.
The mean value obtained for all the clusters along the protein sequences compared gives the final HCA score.
On the HCA profiles, the amino acids are represented by their standard code of a single letter, with the exception of proline (P), glycine (G), serine (S) and threonine (T).
In fact, because of their particular properties, these residues are represented by the special symbols indicated below so as to facilitate their visual identification on the HCA diagrams (cf. list of abbreviations).
Proline introduces high constraints into the polypeptide chain and is considered systematically as an interruption in the clusters. In fact, proline residues stop or deform the helices and the lamellae. Glycine possesses a very substantial conformational flexibility because of the absence of a side chain in this amino acid. Serine and threonine are normally hydrophilic, but they can also be found in hydrophobic environments, such as xcex1-helices, in which their hydroxyl group loses their hydrophilic character because of the hydrogen bond formed with the carbonyl group of the main chain. Within the hydrophobic xcex2-lamellae, threonine is sometimes capable of replacing hydrophobic residues by virtue of the methyl group on its side chain.
Amino acids can be divided into four groups according to their hydrophobicity:
(i)xe2x80x94strongly hydrophobic residues: V, I, L and F;
(ii)xe2x80x94moderately hydrophobic residues: W, M and Y
xe2x86x92W appears at surface sites more frequently than F,
xe2x86x92M is encountered at various sites, internal or otherwise,
xe2x86x92Y can adapt to internal hydrophobic environments and is frequently found in loops;
(iii)xe2x80x94weakly hydrophobic residues: A and C are virtually insensitive to the hydrophobic character of their environment; and
(iv)xe2x80x94hydrophilic residues: D, E, N, Q, H, K and R.
Using this HCA method, the Applicant has found that proteins having an HCA score with the iota-carrageenase of Alteromonas fortis which is greater than or equal to 65% over the domain extending between amino acids 164 and 311 of said iota-carrageenase are enzymes of the glycosyl hydrolase type and more particularly iota-carrageenases appropriate for the production of oligo-iota-carrageenans from carrageenans.
The proteins having an HCA score which is greater than or equal to 70%, preferably greater than or equal to 75%, with the above domain 164-311 are particularly preferred for the purposes of the invention.
One particular example of glycosyl hydrolase obtained with a gene according to the invention is the protein having the amino acid sequence [SEQ ID No. 2], extracted from Alteromonas fortis. 
Another particular example of glycosyl hydrolase obtained with a gene according to the invention is the protein having the amino acid sequence [SEQ ID No. 4], extracted from Cytophaga drobachiensis. 
Likewise, the Applicant has found that proteins having an HCA score with the kappa-carrageenase of Alteromonas carrageenovora which is greater than or equal to 75% over the domain extending between amino acids 117 and 262 of said kappa-carrageenase are enzymes of the glycosyl hydrolase type and more particularly kappa-carrageenases appropriate for the production of oligo-kappa-carrageenans from carrageenans.
The proteins having an HCA score which is greater than or equal to 80%, preferably greater than or equal to 85%, with the above domain 117-262 are particularly preferred for the purposes of the invention.
The above proteins are advantageously extracted from marine bacteria.
One particular example of glycosyl hydrolase obtained with a gene according to the invention is the protein having the amino acid sequence [SEQ ID No. 6], extracted from Alteromonas carrageenovora. 
Another particular example of glycosyl hydrolase obtained with a gene according to the invention is the protein having the amino acid sequence [SEQ ID No. 8], extracted from Cytophaga drobachiensis. 
As indicated previously, the genes according to the invention, coding for glycosyl hydrolases, can be obtained by sequencing the genome of bacteria which product glycosyl hydrolases, as defined above, by the conventional methods well known to those skilled in the art.
The invention further relates to the expression vectors which carry the nucleic acid sequences according to the invention, with the means for their expression.
These expression vectors can be used to transform prokaryotic microorganisms, particularly Escherichia coli, or eukaryotic cells such as yeasts or fungi.
The invention will now be described in greater detail by means of the illustrative and non-limiting Examples below.
The methods used in these Examples are methods well known to those skilled in the art, which are described in detail in the work by Sambrook, Fristsch and Maniatis entitled xe2x80x9cMolecular cloning: a laboratory manualxe2x80x9d, published in 1989 by Cold Spring Harbor Press, New York (2nd edition).