Oligosaccharides and polysaccharides are polymers that consist of monosaccharide (sugar) units that are connected to each other via glycosidic bonds. These polymers have a three-dimensional structure, in addition to the linear (two-dimensional) structure of the sequence of monosaccharide units. Furthermore, recent studies have shown that these carbohydrate polymers also carry biological information. For example, certain proteins and glycoproteins, or proteins that also feature carbohydrate moieties, have been shown to have binding specificity for certain types of carbohydrate polymers. These proteins and glycoproteins are called “lectins”, and their role in various biological and pathological processes is only beginning to be elucidated. Originally, the term “lectins” referred to proteins isolated from plants that bind saccharides. For the purpose of this application, the term “lectin” also encompasses saccharide-binding proteins from animal species (e.g. “mammalian lectins”). Thus, carbohydrate polymers, like DNA or proteins, clearly have an important biological function which should be studied in greater detail.
The saccharide chain of the carbohydrate polymer has, like a chain of DNA or protein, two dissimilar ends. In the case of saccharide chains, these are the reducing end (corresponding to the aldehyde group of the linear sugar molecule) and the non-reducing end. Unlike proteins and DNA, however, saccharides may also be branched, with essentially each of the sugar units in the saccharide serving as an optional branching point. Thus, the three-dimensional structure of the carbohydrate polymer is clearly highly complex with regard to the biological function of the saccharide chain, since even the two-dimensional structure is more complex because of the presence of branches in the chain.
As previously described, there are a number of proteins that bind to saccharides, one example of which is the lectins. Many of these proteins bind specifically to a certain short oligosaccharide sequence. Antibodies are proteins that specifically recognize certain molecular structures. Antibodies may also recognize saccharide structures, as do lectins. Glycosidases are enzymes that cleave glycosidic bonds within the saccharide chain. Also glycosidases may recognize certain oligosaccharide sequences specifically. Glycosyltransferases are enzymes that cleave the saccharide chain, but further transfer a sugar unit to one of the newly created ends.
The art of structural determination of polysaccharides has not developed as rapidly as the art of protein analysis and DNA analysis. Furthermore, the analysis of a very important part of most mammalian proteins, i.e. of their attached saccharides and glycans, has been generally slower compared to the advance made in DNA and protein analysis technology.
Advanced analysis methods have been introduced in the fields of protein and DNA sequencing a number of years ago. However, the development of such methods and techniques are aided by the fact that the components that make up DNA and proteins are connected to each other by only one kind of connection (the 5′ to 3′ phosphoric acid bridge in DNA, and the peptide bond in proteins). DNA contains only four different components (the nucleic acids), while proteins contain about 20 different components (the amino acids). Although modified amino acids exist, a protein must first be synthesized, according to the genetic code, by using a DNA template. Therefore, the number and kind of amino acids that exist in a newly synthesized protein is restricted to the limited repertoire of amino acids represented in the genetic code. This code is universal, with only minor differences, for all life forms.
For the above structural reasons, the structural analysis of proteins and of DNA is today a simple, rapid, and relatively inexpensive procedure that does not require highly skilled personnel.
In contrast, a multitude of methods for the analysis of saccharide structures have been developed, each with its own shortcomings. It is today not possible, independent of the degree of sophistication of the method used, to determine the entire sequence of a polysaccharide or even of an oligosaccharide by using a single technique. There are several reasons for this difficulty. First, saccharides are synthesized template-independent. In the absence of structural information, the researcher must therefore assume that the building units are selected from any of the saccharide units known today. In addition, these units may have been modified, e.g. by the addition of sulfate groups, during synthesis.
Second, the connections between saccharide units are multifold. A saccharide may be connected to any of the C1, C2, C3, C4, or C6 atom if the sugar unit it is connected to is a hexose. Moreover, the connection to the C1 atom maybe in either α or β configuration.
Thirdly, saccharides may be branched, which further complicates their structure and the number of possible structures that have an identical number and kind of sugar units.
A fourth difficulty is presented by the fact that the difference in structure between many sugars is minute, as a sugar unit may differ from another merely by the position of the hydroxyl groups (epimers).
A method for characterizing carbohydrate polymers is disclosed in PCT Application No. PCT/IL00/00256, which is hereby incorporated by reference as if fully set forth herein. According to this method, one or more saccharide-binding agents are attached to a surface. These agents may optionally be lectins, antibodies, other types of proteins that bind to saccharides, or polysaccharide-cleaving or modifying enzymes, for example. Next, the carbohydrate polymer of interest is incubated with the saccharide-binding agents on the surface. Such a carbohydrate polymer may be any molecule with a polysaccharide component, such as a polysaccharide itself, a glycoprotein or a glycolipid for example. If the carbohydrate polymer binds to the saccharide-binding agent, then a complex is formed. This complex may then be detected with a second saccharide-binding agent, which for example may optionally have some type of attached label for the purpose of detection. Examples of such a label include but are not limited to a chromogenic label, a radiolabel, a fluorescent label, and a biotinylated label.
The use of a plurality of such saccharide-binding agents, whether fixed to the substrate and/or employed as the second (soluble) saccharide-binding agent, characterizes the carbohydrate polymer of interest by providing a “fingerprint” of the saccharide. Such a fingerprint can then be analyzed in order to obtain more information about the carbohydrate polymer. Unfortunately, the process of characterization and interpretation of the data for carbohydrate polymer fingerprints is far more complex than for other biological polymers, such as DNA for example. Unlike binding DNA probes to a sample of DNA for the purpose of characterization, the carbohydrate polymer fingerprint is not necessarily a direct indication of the components of the carbohydrate polymer itself. DNA probe binding provides relatively direct information about the sequence of the DNA sample itself, since under the proper conditions, recognition and binding of a probe to DNA is a fairly straightforward process. Thus, a DNA “fingerprint” which is obtained from probe binding can yield direct information about the actual sequence of DNA in the sample.
By contrast, binding of agents to carbohydrate polymers is not nearly so straightforward. As previously described, even the two-dimensional structure (sequence) of carbohydrate polymers is more complex than that of DNA, since carbohydrate polymers can be branched. These branches clearly affect the three-dimensional structure of the polymer, and hence the structure of the recognition site for the binding agent. In addition, recognition of binding epitopes on carbohydrate polymers by the binding agents may be affected by the “neighborhood” of the portion of the molecule that is surrounding the epitope. Thus, the analysis of such “fingerprint” data for the binding of agents to the carbohydrate polymer of interest is clearly more difficult than for DNA probe binding, for example.
In addition, the analysis is further complicated by the possibility of a combinatorial explosion, which can result when attempting to search through a combinatorial space. A combinatorial space is defined as having multiple combinations of basic elements. These combinations may differ according to the values of different types of these elements, the structure of the resultant combination of elements, or may be produced as a result of both factors. Combinatorial spaces often occur in biology, as many elementary biological materials are themselves produced through combinations of relatively basic building blocks, yet are highly complex in their resultant structure and/or function. Attempts to analyze fingerprint data from carbohydrate polymers, as described above, is one example of a combinatorial space. A search through such a combinatorial space may also be termed a “combinatorial search”.
Searching through combinatorial space is a difficult problem, since the despite the apparent simplicity of these different types of building blocks, the huge number and complexity of the resultant combinations make an exhaustive search of the combinatorial space difficult if not impossible. Thus, searching through these types of combinatorial spaces, particularly for biological problems, has typically proved to be resistant to modeling and prediction by computational algorithms in software programs.