The present invention relates to methods and products for analyzing polymers. In particular, the methods are based on generation of information from a data set of polymer dependent impulses arising from polymers which have been labeled according to an ordered strategy. The information generated relates to many aspects of the polymer such as the length of the polymer, the composition of units within the polymer, the order of units in the polymer, and the sequence or partial sequence of units in the polymer. The invention also relates to methods for intensity based analysis.
Polymers are involved in diverse and essential functions in living systems. The ability to decipher the function of polymers in these systems is integral to the understanding of the role that the polymer plays within a cell. Often the function of a polymer in a living system is determined by analyzing the structure and determining the relation between the structure and the function of the polymer. By determining the primary sequence in a polymer such as a nucleic acid it is possible to generate expression maps, to determine what proteins are expressed, and to understand where mutations occur in a disease state. Because of the wealth of knowledge that may be obtained from sequencing of polymers many methods have been developed to achieve more rapid and more accurate sequencing methods.
In general DNA sequencing is currently performed using one of two methods. The first and more popular method is the dideoxy chain termination method described by Sanger et al. (1977). This method involves the enzymatic synthesis; of DNA molecules terminating in dideoxynucleotides. By using the four ddNTPs, a population of molecules terminating at each position of the target DNA can be synthesized. Subsequent analysis yields information on the length of the DNA molecules and the base at which each molecule terminates (either A, C, G, or T). With this information, the DNA sequence can be determined. The second method is Maxam and Gilbert sequencing (Maxam and Gilbert, 1977), which uses chemical degradation to generate a population of molecules degraded at certain positions of the target DNA. With knowledge of the cleavage specificities of the chemical reactions and the lengths of the fragments, the DNA sequence is generated. Both methods rely on polyacrylamide gel electrophoresis and photographic visualization of the radioactive DNA fragments. Each process takes about 1-3 days. The Sanger sequencing reactions can only generate 300-800 bases in one run.
Methods to improve the output of sequence information using the Sanger method also have been proposed. These Sanger-based methods include multiplex sequencing, capillary gel electrophoresis, and automated gel electrophoresis. Recently, there has also been increasing interest in developing Sanger independent methods as well. Sanger independent methods use a completely different methodology to realize the base information. This category includes scanning electron microscopy (STM), mass spectrometry, enzymatic luminometric inorganic pyrophosphate detection assay (ELIDA) sequencing, exonuclease sequencing, and sequencing by hybridization.
Further, several new methods have been described for carboxy terminal sequencing of polypeptides. See Inglis, A. S., Anal. Biochem. 195:183-96 (1991). Carboxy terminal sequencing methods mimic Edman degradation but involve sequential degradation from the opposite end of the polymer. See Inglis, A. S., Anal. Biochem. 195:183-96 (1991). Like Edman degradation, the carboxy-terminal sequencing methods involve chemically induced sequential removal and identification of the terminal amino acid residue.
More recently, polypeptide sequencing has been described by preparing a nested set (sequence defining set) of polymer fragments followed by mass analysis. See Chait, B. T. et al., Science 257:1885-94 (1992). Sequence is determined by comparing the relative mass difference between fragments with the known masses of the amino acid residues. Though formation of a nested (sequence defining) set of polymer fragments is a requirement of DNA sequencing, this method differs substantially from the conventional protein sequencing method consisting of sequential removal and identification of each residue. Although this method has potential in practice it has encountered several problems and has not been demonstrated to be an effective method.
The present invention relates in some aspects to methods and products for analyzing polymers. In particular the invention in one aspect is a method for identifying information about a polymer such as its sequence, length, order of bases etc., by obtaining polymer dependent impulses from a population of polymers and comparing the polymer dependent impulses to determine unit specific information about the polymers.
Recently, methods for analyzing polymers based on unit specific information about the polymer have been developed. Such methods are described in co-pending PCT patent application No. PCT/US98/03024 and U.S. Ser. No. 09/134,411 filed Aug. 13, 1998, the entire contents of which are hereby incorporated by reference. The method for analyzing polymers described in PCT/US98/03024 and 09/134,411 is based on the ability to examine each unit or unit specific marker of a polymer individually. By examining each unit or unit specific marker individually the type of units and the position of the units on the backbone of the polymer can be identified. This can be accomplished by positioning a labeled unit or unit specific marker at a station and examining a change which occurs when that labeled unit or unit specific marker is proximate to the station. The change can arise as a result of an interaction that occurs between the unit or unit specific marker and the station or a partner and is specific for the particular unit or unit specific marker. For instance if the polymer is a nucleic acid molecule and a T is positioned in proximity to a station a change which is specific for a T could occur. If on the other hand, a G is positioned in proximity to a station then a change which is specific for a G could occur. The specific change which occurs, for example, depends on the station used, the type of polymer being studied and/or the label used. For instance the change may be an electromagnetic signal which arises as a result of the interaction.
Methods for analyzing polymers based on unit specific information about the polymer involves the detection of polymer dependent impulses from a plurality of polymers to produce a data set of information. The data set can be compared to provide specific information about the polymer such as the composition of units in the polymer, the length of the polymer, the presence of specific sequences in the polymer, and even the entire sequence of the units in the polymer.
In one aspect the invention is a method for generating unit specific information about a polymer. The method includes the steps of obtaining polymer dependent impulses for a plurality of labeled polymers, comparing the polymer dependent impulses obtained from each of the plurality of labeled polymers, determining unit specific information about the polymers based upon comparing the polymer dependent impulses. Preferably the polymer dependent impulses arise from unit specific markers of less than all units of the polymers. In an embodiment the polymer dependent impulses arise from at least two unit specific markers of the polymers.
The plurality of polymers may be any type of polymer but preferably is a nucleic acid. In one embodiment the plurality of polymers is a homogenous population. In another embodiment the plurality of polymers is a heterogenous population. The polymers can be labeled, randomly or non randomly. Different labels can be used to label different linked units to produce different polymer dependent impulses.
The polymer dependent impulses provide many different types of structural information about the polymer. For instance the obtained polymer dependent impulses may include an order of polymer dependent impulses or the obtained polymer dependent impulses may include the time of separation between specific signals or the number of specific polymer dependent impulses. The obtained polymer dependent impulses may indicate the sequence of units of the polymer.
In one important embodiment the polymer dependent impulses are obtained by moving the plurality of polymers linearly past a signal generation station.
According to another embodiment the unit specific markers are nucleic acid probes. In another embodiment the unit specific markers are peptide nucleic acid probes.
The unit specific markers may identify a single unit of a polymer or multiple units of a polymer. When the polymer is a nucleic acid the unit specific marker may be a nucleic acid probe. In one embodiment the unit specific marker is a nucleic acid probe having at least two base pairs. In another embodiment the unit specific marker is a nucleic acid probe having at least three base pairs.
According to another aspect of the invention a method for sequencing a polymer of linked units is provided. The method includes the steps of obtaining polymer dependent impulses from a plurality of overlapping polymers, at least a portion of each of the polymers having a sequence of linked units identical to the other of the polymers, and comparing the polymer dependent impulses from an overlapping portion of each of the plurality of polymers to obtain a sequence of linked units which is identical in the plurality of polymers.
The polymer dependent impulses may be detected by many means. A preferred method of detection is optical detection.
The plurality of polymers may be any type of polymer but preferably is a nucleic acid. Preferably the nucleic acids are labeled with an agent selected from the group consisting of an electromagnetic radiation source, a quenching source and a fluorescence excitation source. In one embodiment the plurality of polymers is a homogenous population. In another embodiment the plurality of polymers is a heterogenous population. The polymers can be labeled, randomly or non randomly. Different labels can be used to label different linked units to produce different polymer dependent impulses.
The polymer dependent impulses provide many different types of structural information about the polymer. For instance the obtained polymer dependent impulses may include an order of polymer dependent impulses or the obtained polymer dependent impulses may include the time of separation between specific signals or the number of specific polymer dependent impulses. The obtained polymer dependent impulses may indicate the sequence of units of the polymer.
In one important embodiment the polymer dependent impulses are obtained by moving the plurality of polymers linearly past a signal generation station.
According to another embodiment the unit specific marker is a nucleic acid probe. In another embodiment the unit specific markers is a peptide nucleic acid probe. In another embodiment, the unit specific marker is a peptide.
The unit specific markers may identify a single unit of a polymer or multiple units of a polymer. When the polymer is a nucleic acid the unit specific marker may be a nucleic acid probe. In one embodiment the unit specific marker is a nucleic acid probe having at least three base pairs. In another embodiment the unit specific markers are three base pair nucleic acid probes.
The invention in another aspect is a kit for labeling polymers. The kit includes a container housing a series of distinct nucleic acid probes; wherein the series of nucleic acid probes is a set of multiple base pair probes. Preferably the multiple base pair probes are selected from the group consisting of two base pair probes, three base pair probes, four base pair probes, and five base probes.
In one embodiment the container is a single container having a plurality of compartments, each housing a specific labeled probe. In another embodiment the container is a plurality of containers.
The kit in one embodiment also includes instructions for labeling the nucleic acid probes.
The distinct nucleic acid probes are labeled in one embodiment. Preferably the nucleic acid probes are labeled with an agent selected from the group consisting of an electromagnetic radiation source, a quenching source and a fluorescence excitation source. In one embodiment the plurality of polymers is a homogenous population. In another embodiment the distinct nucleic acid probes are three base pair probes. In another embodiment the distinct nucleic acid probes are four base pair probes. In yet another embodiment the distinct nucleic acid probes are five base pair probes.
The invention in other aspects relates to methods and products for linear analysis of polymers using an intensity based method for identifying information about the polymer such as its sequence, length, order of bases etc. The methods can be accomplished using intensity based measurements combined with the ordered labeling strategy discussed above.
One aspect of linear analysis involves the movement of the polymer past a fixed station in such a manner as to cause a signal that provides information about the polymer to arise. According to an aspect of the invention it was discovered that information about the polymer can be determined by quantitatively measuring intensity of the signal arising at the station. The signal arises from the polymer as a result of the units of the polymer passing the fixed station. In some cases all of the units may cause the generation of a signal and in other cases less than all of the units produce the signal. The total intensity of the signal is proportional to the number of units or unit specific markers which generate a signal as they pass the fixed station. If the signal arises from every unit of the polymer then the intensity of the signal is proportional to the number of units in the polymer. If the signal arises from less than all of the units or unit specific markers of the polymer then the intensity of the signal is proportional to that number of units or unit specific markers causing generation of the signal. The number of units or unit specific markers indicated by the intensity can be used to determine information about the polymer such as the composition of units in the polymer, the length of the polymer, the presence of specific sequences in the polymer, and even the entire sequence of the units in the polymer.
The invention in another aspect is a method for analyzing a polymer by linearly moving a labeled polymer with respect to a fixed station, obtaining a signal from the labeled polymer as the labeled polymer passes the fixed station, wherein the signal is an electromagnetic radiation signal arising from an interaction between at least two distinct labeled unit specific markers and determining a quantitative measure of intensity of the signal to analyze the polymer.
The intensity of the signal provides various types of structural information about a polymer, depending on how the polymer is labeled. In one embodiment each unit of the labeled polymer is labeled with a unit specific marker and the quantitative measure of intensity of the signal indicates the length of the polymer. In another embodiment less than all units of the polymer are labeled with at least one unit specific marker and the quantitative measure of intensity of the signal indicates the number of labeled unit specific markers present in the polymer.
The fixed station which gives rise to the signal when the labeled polymer interacts with the station in one embodiment is an electromagnetic radiation source. In another embodiment the fixed station is a radiation source.
More than one polymer may be analyzed to generate a data set representative of a population of polymers. Thus in one embodiment a plurality of polymers are analyzed simultaneously to produce a plurality of signals, one signal for each polymer, and further comprising the step of comparing the intensities of the signals to analyze the polymers.
The labeled polymer may be labeled with a unit specific marker. In one embodiment the unit specific marker is a peptide nucleic acid probe. In another embodiment the unit specific marker is a series of distinct nucleic acid probes selected from the group consisting of two base pair probes, three base pair probes, four base pair probes, and five base pair probes. In yet another embodiment the unit specific marker is a fluorescent probe.
According to another embodiment the labeled polymer is labeled with a plurality of unit specific markers, wherein at least one unit specific marker includes a fluorophore which emits light at a first wavelength and at least one unit specific marker which includes a fluorophore which emits light at a second wavelength. In another embodiment the at least one unit specific marker which includes the fluorophore which emits light at the first wavelength is attached to end units of the polymer and wherein the at least one unit specific marker which emits light at the second wavelength is attached to an internal unit of the polymer.
Each of the limitations of the invention can encompass various embodiments of the invention. It is, therefore, anticipated that each of the limitations of the invention involving any one element or combinations of elements can be included in each aspect of the invention.