Genes impart function through surrogate structures. Understanding the products of gene expression and their interrelationships will provide the answers to normal and untoward cellular function and open the door to strategies that will address improvements in human health and disease.
The function of a gene is manifested through its primary products, proteins. A measure of all proteins within a given cell or tissue has been defined as its proteome. As the structural details of biological activity are further investigated, numerous secondary gene products, (post-translational processes), are also being found as entities providing function. Foremost of these is glycosylation, a process that greatly extends the molecular specificity of cell function while adding considerable analytical difficulty for its structural characterization.
Thus, with completion of the human genome, research is now focusing on the next level of biological function, the proteins. But, many proteins are post-translationally modified by glycosylation, and in numerous cases, these glycosylated entities are specifically responsible for function. Moreover, recent estimates indicate that more than half of all human proteins are glycosylated, and this strongly suggests extensive new efforts must be devoted to the analysis of molecular glycosylation. Mutations to a gene that is involved in glycosylation affect numerous glycoproteins and as a consequence of such mutation a multiplicity of phenotypes arise. This is exactly the case in carbohydrate deficient glycosylation (CDG) where patients suffer multiple maladies as a consequence of a single gene mutation. In contrast, a protein mutation influences only that specific protein developing a single phenotype. Therefore, the study of molecular glycosylation is integral to the study of proteins and their structures and functions.
While it is relatively easy to isolate and structurally characterize linear biopolymers, e.g., DNA, RNA, proteins, the techniques to analyze the variably linked and multiply branched carbohydrate structures remain in their infancy. When these residues are conjugated to other natural polymers structures, (lipids, proteins), the carbohydrate moieties are referred to as glycans. In glycoproteins, the glycans are attached either through hydroxy amino acids, (serine or threonine), or amide linked (asparagines), to produce O- and N-linked structures, respectively. Adding greater analytical challenge is the fact that glycans frequently exhibit heterogeneity (glycoforms) at their non-reducing terminus. Thus, a single protein gene product becomes a heterogeneous glycoprotein mixture with altered biochemical, chemical and physical properties, (pleiotrophism). Characterization of the carbohydrates on glycoproteins requires release and purification, followed by a mass spectral determination of their variable sequence, intra-residue linkage, and branching structures.
At present there are three major methods for releasing intact glycans from proteins. These include a biochemical release with endoglycosidases or a chemical release with hydrazine or strong base. The former method releases only N-linked glycans, while hydrazine treatment releases both N-, and O-linked moieties. O-linked glycans can also be specifically released with base by a classical E2 elimination, (elimination second order).
Importantly, both enzymatic and chemical procedures are quantitative, and all released structures expose an identical hemiacetal, reducing terminus that provides a single chemical approach for covalent capture. Such covalent capture includes all glycoforms irrespective of variant non-reducing termini. Lectins are naturally occurring protein receptors, (natural traps), for carbohydrate structures with binding specificity for non-reducing epitopes within glycan structures. Unfortunately, with their exacting specificity, any single lectin would be ineffective in capturing all glycoforms existing on glycoproteins. Columns with combinations of lectins have been proposed, but such strategies have obvious constraints and presuppose one knows the sample structure before analysis. Furthermore, lectins are relatively difficult to isolate, and expensive to purchase.
Robotic high throughput techniques to characterize proteins utilize 2D gels to separate the total proteome. This important chromatographic separation provides discrete spots for proteins, but the resulting “spots” greatly spread due to the glycan heterogeneity of glycoproteins and this results in diminished sensitivity when using 2D gels on glycoproteins. Moreover, since additional analytical steps are required to fully sequence the glycan, larger amounts of material are required vis-à-vis proteins. Equally as important to specific trapping of glycans is facile and quantitative release for subsequent structural characterization. Thus, there is great need to develop sample handling and trapping strategies to facilitate the study of a cell's glycome at the sensitivities extant for the proteome.