Various notational systems have been used to encode classes of chemical units by assigning a unique code to each chemical unit in the class. For example, a conventional notational system for encoding amino acids assigns a single letter of the alphabet to each known amino acid. A polymer of chemical units may be represented using such a notational system using a set of codes corresponding to the chemical units. Such notational systems have been used to encode polymers, such as proteins, in a computer-readable format. A polymer that has been represented in such a computer-readable format according to a notational system may be stored and processed by a computer.
Conventional notational schemes for representing chemical units have represented the chemical units as characters (e.g., A, T, G, and C for nucleic acids), and have represented polymers of chemical units as sequences or sets of characters. Various operations may be performed on such a notational representation of a chemical unit or a polymer comprised of chemical units. For example, a user may search a database of chemical units for a query sequence of chemical units. In such a case, the user typically provides a character-based notational representation of the sequence in the form of a sequence of characters, which is compared against the character-based notational representations of sequences of chemical units stored in the database. Character-based searching algorithms, however, are typically slow because such algorithms search by comparing individual characters in the query sequence against individual characters in the sequences of chemical units stored in the database. The spread of such algorithms is therefore related to the length of the query sequence, resulting in particularly poor performance for long query sequences.
The study of molecular and cellular biology is focused on the macroscopic structure of cells. We now know that cells have a complex microstructure that determine the functionality of the cell. Much of the diversity associated with cellular structure and function is due to the ability of a cell to assemble various building blocks into diverse chemical compounds. The cell accomplishes this task by assembling polymers from a limited set of building blocks referred to as monomers. The key to the diverse functionality of polymers is based in the primary sequence of the monomers within the polymer and is integral to understanding the basis for cellular function, such as why a cell differentiates in a particular manner or how a cell will respond to treatment with a particular drug.
The ability to identify the structure of polymers by identifying their sequence of monomers is integral to the understanding of each active component and the role that component plays within a cell. By determining the sequences of polymers it is possible to generate expression maps, to determine what proteins are expressed, to understand where mutations occur in a disease state, and to determine whether a polysaccharide has better function or loses function when a particular monomer is absent or mutated.
Polymers may be characterized by identifying properties of the polymers and comparing those properties to reference polymers, a process referred to herein as property encoded nomenclature (PEN). In one embodiment, the properties are encoded using a binary notation system, and the comparison is accomplished by comparing the binary representations of polymers. For instance, in one aspect a sample polymer is subjected to an experimental constraint to modify the polymer, the modified polymer is compared to a reference database of polymers to identify a population of polymers having a property that is the same as or similar to a property of the sample polymer. The method may be repeated until the population of polymers in the reference database is reduced to one and the identity of the sample polymer is known.
In a system including a database of properties of polymers of chemical units a method for determining the composition of a sample polymer of chemical units having a known molecular weight and length is provided according to one aspect of the invention. The method includes the steps of
(A) selecting, from the database, candidate polymers of chemical units having the same length as the sample polymer of chemical units and having molecular weights similar to the molecular weight of the sample polymer of chemical units;
(B) performing an experiment on the sample polymer of chemical units;
(C) measuring properties of the sample polymer of chemical units resulting from the experiment; and
(D) eliminating, from the candidate polymers of chemical units, polymers of chemical units having properties that do not correspond to the experimental results.
In some embodiments the method also includes the step of:
(E) repeatedly performing the step (D) until the number of candidate polymers of chemical units falls below a predetermined threshold.
In other aspects the invention is a method for identifying a population of polymers of chemical units having the same property as a sample polymer of chemical units. The method includes the steps of determining a property of a sample polymer of chemical units, and comparing the property of the sample polymer to a reference database of polymers of known sequence and known properties to identify a population of polymers of chemical units having the same property as a sample polymer of chemical units, wherein the reference database of polymers includes identifiers corresponding to the chemical units of the polymers, each of the identifiers including a field storing a value corresponding to the property.
In one embodiment the step of determining a property of the sample polymer involves the use of mass spectrometry, such as for example, matrix assisted laser desorption ionization mass spectrometry (MALDI-MS), electron spray-MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD) to determine the molecular weight of the polymer. MALDI-MS, for instance, may be used to determine the molecular weight of the polymer with an accuracy of approximately one Dalton.
The step of identifying a property of the polymer in other embodiments may involve the reduction in size of the polymer into pieces of several units in length that may be detected by strong ion exchange chromatography. The fragments of the polymer may be compared to the reference database polymers.
According to other aspects, the invention is a method for identifying a subpopulation of polymers having a property in common with a sample polymer of chemical units. The method involves the steps of applying an experimental constraint to the polymer to modify the polymer, detecting a property of the modified polymer, identifying a population of polymers of chemical units having the same molecular length as the sample polymer, and identifying a subpopulation of the identified population of polymers having the same property as the modified polymer by eliminating, from the identified population of polymers, polymers having properties that do not correspond to the modified polymer. The steps may be repeated on the modified polymer to identify a second subpopulation within the subpopulation of polymers having a second property in common with the twice modified polymer. Each of the steps may then be repeated until the number of polymers within the subpopulation falls below a predetermined threshold. The method may be performed to identify the sequence of the polymer. In this case the predetermined threshold of polymers within the subpopulation is two polymers.
In yet another aspect, the invention is a method for identifying a subpopulation of polymers having a property in common with a sample polymer of chemical units. The method involves the steps of applying an experimental constraint to the polymer to modify the polymer, detecting a first property of the modified polymer, identifying a population of polymers of chemical units having a second property in common with the sample polymer, and identifying a subpopulation of the identified population of polymers having the same first property as the modified polymer by eliminating, from the identified population of polymers, polymers having properties that do not correspond to the modified polymer.
In one embodiment the experimental constraints applied to the polymer are different for each repetition. The experimental constrain may be any manipulation which alters the polymer in such a manner that it will be possible to derive structural information about the polymer or a unit of the polymer. In some embodiments the experimental constraint applied to the polymer may be any one or more of the following constraints: enzymatic digestion, e.g., with an exoenzyme, an endoenzyme, a restriction endonuclease; chemical digestion; chemical modification; interaction with a binding compound; chemical peeling (i.e., removal of a monosaccharide unit); and enzymatic modification, for instance sulfation at a particular position with a heparin sulfate sulfotransferases.
The property of the polymer that is detected by the method of the invention may be any structural property of a polymer or unit. For instance the property of the polymer may be the molecular weight or length of the polymer. In other embodiments the property may be the compositional ratios of substituents or units, type of basic building block of a polysaccharide, hydrophobicity, enzymatic sensitivity, hydrophilicity, secondary structure and conformation (i.e., position of helices), spatial distribution of substituents, ratio of one set of modifications to another set of modifications (i.e., relative amounts of 2-O sulfation to N-sulfation or ratio of iduronic acid to glucuronic acid, and binding sites for proteins.
The properties of the modified polymer may be detected in any manner possible which depends on the property and polymer being analyzed. In one embodiment the step of detection involves mass spectrometry such as matrix assisted laser desorption ionization mass spectrometry (MALDI-MS), electron spray MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD). Alternatively, the step of detection involves strong ion exchange chromatography, for example, if the polymer has been digested into several smaller fragments composed of several units each.
The method is based on a comparison of the sample polymer with a population of polymers of the same length or having at least one property in common. In some embodiments the population of polymers of chemical units includes every polymer sequence having the molecular weight of the sample polymer. In other embodiments the population of polymers of chemical units includes less than every polymer sequence having the molecular weight of the sample polymer. According to some embodiments the step of identifying includes selecting the population of polymers of chemical units from a database including molecular weights of polymers of chemical units. Preferably the database includes identifiers corresponding to chemical units of a plurality of polymers, each of the identifiers including a field storing a value corresponding to a property of the corresponding chemical unit.
According to another aspect of the invention a method for compositional analysis of a sample polymer is provided. The method includes the steps of applying an experimental constraint to the sample polymer to modify the sample polymer, detecting a property of the modified sample polymer, and comparing the modified sample polymer to a reference database of polymers of identical size as the polymer, wherein the polymers of the reference database have also been subjected to the same experimental constraint as the sample polymer, wherein the comparison provides a compositional analysis of the sample polymer.
In some embodiments the compositional analysis reveals the number and type of units within the polymer. In other embodiments the compositional analysis reveals the identity of a sequence of chemical units of the polymer.
Similarly to the aspects of the invention described above the properties of the polymer may be detected in any manner possible and will depend on the particular property and polymer being analyzed. In one embodiment the step of detection involves mass spectrometry such as matrix assisted laser desorption ionization mass spectrometry (MALDI-MS), electron spray MS, fast atom bombardment mass spectrometry (FAB-MS) and collision-activated dissociation mass spectrometry (CAD). Preferably the experimental constraint applied to the polymer is an enzymatic or chemical reaction which involves incomplete enzymatic digestion of the polymer and wherein the steps of the method are repeated until the number of polymers within the reference database falls below a predetermined threshold. Alternatively, the step of detection involves capillary electrophoresis, particularly when the experimental constraint applied to the polymer involves complete degradation of the polymer into individual chemical units.
In one embodiment the reference database includes identifiers corresponding to chemical units of a plurality of polymers, each of the identifiers including a field storing a value corresponding to a property of the corresponding chemical unit.
According to yet another aspect of the invention a method for sequencing a polymer is provided. The method includes the steps of applying an experimental constraint to the polymer to modify the polymer, detecting a property of the modified polymer, identifying a population of polymers having the same molecular length as the sample polymer and having molecular weights similar to the molecular weight of the sample polymer, identifying a subpopulation of the identified population of polymers having the same property as the modified polymer by eliminating, from the identified population of polymers, polymers having properties that do not correspond to the modified polymer, and repeating the steps applying an experimental constraint, detecting a property and identifying a subpopulation by applying additional experimental constraints to the polymer and identifying additional subpopulations of polymers until the number of polymers within the subpopulation is one and the sequence of the polymer may be identified.
In another aspect the invention relates to a method for identifying a polysaccharide-protein interaction, by contacting a protein-coated MALDI surface with a polysaccharide containing sample to produce a polysaccharide-protein-coated MALDI surface, removing unbound polysaccharide from the polysaccharide-protein-coated MALDI surface, and performing MALDI mass spectrometry to identify the polysaccharide that specifically interacts with the protein coated on the MALDI surface.
In one embodiment a MALDI matrix is added to the polysaccharide-protein-coated MALDI surface. In other embodiments an experimental constraint may be applied to the polysaccharide bound on the polysaccharide-protein-coated MALDI surface before performing the MALDI mass spectrometry analysis. The experimental constraint applied to the polymer in some embodiments is digestion with an exoenzyme or digestion with an endoenzyme. In other embodiments the experimental constraint applied to the polymer is selected from the group consisting of restriction endonuclease digestion; chemical digestion; chemical modification; and enzymatic modification.
Each of the limitations of the invention can encompass various embodiments of the invention. It is, therefore, anticipated that each of the limitations of the invention involving any one element or combinations of elements may be included in each aspect of the invention.