1. Introduction
This invention relates to the rapid sequencing of long linear polymers. More particularly, the invention relates to the formation of a rotaxane comprising the polymer to be sequenced and a judiciously selected cyclic molecule which is slid along the polymer by a probe that detects the change in signal as the cycle passes from one monomer or unit of the polymer to the next. In a preferred embodiment the invention relates to the sequencing of DNA using a scanning probe microscope.
2. Description of the Prior Art
Since the sequence of monomers in a polymer chain determines its properties, such information is of great interest to the chemical industry. This interest is nowhere more intense than in the sequence of deoxyribonucleic acid (DNA), the polymer that determines the physiological properties and function of virtually every living organism. Just to obtain a working draft of the human genome sequence, for example, the United States Government spent $300 million dollars from 1990 to date, with an additional $200 million estimated to complete the task by the end of 2003. This work is expected to result in improvements in forensic analysis; diagnosis of genetic disease or predisposition thereto; bioterrorism and biowarfare countermeasures; pharmaceutical research and development, including a cure for cancer; and genetic engineering for agricultural, chemical, waste-remediation, and other products.
Current sequencing methods are very slow. As mentioned above, the human genome sequence has required thirteen years to complete. Even using accelerated technology and relying on accrued databases, Celera Genomics Inc., Rockville, Md., spent nine months on a similar program. In addition, existing sequencing methods suffer from a frequency of inaccuracies that make tedious error-checking necessary.
The most widely used DNA sequencing technology is described by H. G. Griffin, A. M. Griffin, eds., DNA Sequencing Protocols, in Methods Mol. Biol. (Humana Press, Totowa, N.J.), vol. 23, 1993. It is based on that reported by Sanger et al, Proc. Natl. Acat. Sci USA, 1977, 74, 5463 and augmented by the polymerase chain reaction (PCR), reviewed by I. S. Bevan, R. Rapley, M. R. Walker, PCR Methods Appl., 1992, 1, 222. Each of the three publications above is incorporated herein by reference.
The method described in the above reference employs four steps: first, the DNA is enzymatically cleaved into fragments of manageable size, about 500 bases long; second, each fragment is replicated via PCR, from a mixture of normal nucleotides and some bearing 3′-dideoxy sugars. When one of the latter is incorporated in the replication, it terminates the fragment, since the 3′-OH group from which the chain would be extended is absent. The fraction of dideoxy nucleotides is adjusted to ensure that their incorporation will result statistically in a population of chains that includes all lengths from 2 to 500.
Third, the populations are chromatographed using a gel that separates them by chain length; thus each chain passing through contains one more nucleotide than that eluting before it. And fourth, each of the terminating nucleotides having been labeled specifically with one of four different dyes, the sequence of the ˜500-base fragment from which all the chains were made can be read by identifying the dye fluorescing in each fraction.
The Maxam-Gilbert method is similar, labeling the 5′ end of the 500-base fragment, and then cleaving chemically rather than enzymatically. In addition, each of the chemical agents cleaves specifically at one of the four nucleotides. The four mixtures are then separated in four lanes on a gel plate by length. After labeling the plate abcissa with A, C, G, and T, and the ordinate with all of the possible chain lengths, the sequence can be read.
Each of these methods requires steps of replication, cleavage, labeling, and reading, a tedious process prone to errors. To address these problems, machines have been developed, chiefly by Applied Biosystems, presently a division of Applera Corporation, that not only carry out the process automatically, but can sequence many samples simultaneously.
The deficiencies in the current technology have been addressed by other methods. A group at Affymetrix Inc., Santa Clara, Calif. has developed a chip sequencer, disclosed by Fodor et al in PCT Int. Appl. WO 95 00,530. All possible combinations of an octanucleotide are deposited photolithographically onto a silicon chip, in the first step divided into, e.g., quadrants, each covered by one of the four nucleotides containing protective groups. The area is divided into eight sections and the protective groups are selectively photolyzed and reacted with another layer of nucleotides, these steps being repeated until an entire octanucleotide monolayer has been deposited in an array of 48 or 65,536 bases. To detect which sequence is interacting with the target octanucleotide, the chip bases or the target molecule are modified with a fluorescent dye. Since the target molecule may not bind to the chip with 100% specificity, more than spot will fluoresce; the brightest one is considered to be the matching sequence. In order to increase the number of nucleotides per test spot—currently up to 25—an algorithm is used to eliminate those sequences least likely to be a match. Array technology has been reviewed by Li et al, Microcirculation, 2002, 9, 13, incorporated herein by reference.
A mass spectrometric (MS) sequencing method based on Sanger sequencing has been disclosed by Fu et al in U.S. Pat. No. 6,436,635; MS sequencing has been reviewed by Uber and Oberacher, Mass Spectrometry Reviews 2001, 2002, 20, 310, incorporated herein by reference.
Pyrosquencing proceeds in ofur steps: (1) synthesis of the DNA strand complementary to the unknown; (2) release of one pyrophosphate molecule (PPi) per nucleotide incorporated; (3) conversion of PPi by ATP sulfurylase to adenosine triphosphate (ATP); ATP-powered oxidation of luciferin by luciferase, resulting in light emission. Only the matching base will cause the system to light, allowing determination fo the sequence. This technique has been reviewed by Fakhrai-Rad et al, Human Mutation, 2002, 19, 479, incorporated herein by reference.
A single-molecule procedure developed by Keller at Los Alamos National Laboratory is discussed by Ambrose et al, Ber. Bunsen-Ges. Phys. Chem., 1993, 97, 1535, incorporated herein by reference. A DNA molecule is replicated from a pool of nucleotides, all of which are fluorescently labeled and suspended in a flowing stream. The nucleotides are cleaved sequentially with an exonuclease, and the individual fluorescently labeled bases identified as they are carried downstream past a laser-induced fluorescence detector.
However, the replicating enzyme is often confused by a labeled base, resulting in incorporation of a base different from that in the DNA to be sequenced, and leading to an error. Since in a real analysis the original sequence would be unknown, no basis for comparison would exist, and the error would not be detected. Also, it is difficult to control the exonuclease rate or processivity, especially critical in a flowing stream, where the enzyme will be washed away if it falls off.
A number of approaches based on scanning probe microscopy have been published. Atomic force microscopy (AFM) has been disclosed by Bensimon et al, in PCT Int. Appl. WO 94 23,065 to measure the energy required to separate each pair of bases in a double-stranded (ds) DNA molecule or the energy obtained from pairing a single-stranded (ss) DNA with a standard. The identity of the base, and hence the sequence, can be obtained from the energy value. It is clear, however, that this value must change as the point of separation recedes from the AFM tip, or that the tip must be repositioned over each base pair to be separated. Moreover, the energy required to break the hydrogen bonds between complementary bases is low enough to disapper into ambient thermal noise.
Sequencing by chemical force microscopy (CFM), i.e. AFM with a chemically modified tip that interacts differently with each base, has been discussed by G. U. Lee et al, Isr. J. Chem., 1996, 36, 81, incorporated herein by reference. A substrate with pathways to align labeled DNA molecules for sequencing by scanning tunneling microscopy (STM) has also been disclosed, by Sargent et al in PCT Int. Appl. WO 96 24,689.
Cherkasky has disclosed a method in German Patent No. 19,937,512 for purifying chromosomal DNA, immobilizing it on a long glass plate, and stretching it linearly.
Methods involving the threading of DNA through pores have been reviewed by Deamer and Branton, Acc. Chem. Res., 2002, 35, 817, incorporated herein by reference. A nanopore is formed by inserting α-hemolysin into a lipid membrane, which is plated on both surfaces. The membrane is immersed in an electrolyte solution and the current measured. When a single-stranded oligonucleotide is shot through the pore by an electric field, it excludes electrolyte from the pore and interrupts the current. When the junction in a block oligonucleotide passes through the portal, i.e., when the base changes, a momentary peak appears signaling the event.
Chan discloses the use of a molecular motor, a particular class of enzyme such as a DNA polymerase, in U.S. Pat. No. 6,210,896. The molecular motor, labeled with a fluorescing function, either moves along, or causes to pass through itself, a DNA labeled with other fluorescing functions. Electromagnetic radiation is continuously supplied, so that when the molecular motor passes over one of the labels, the energy transfer between the two fluorescing functions can be detected. The molecular motor is held by electrostatic force in channels fabricated in the apparatus near the detector.
Although not acknowledged in the specification, the enzyme possesses a pore in the form of the so-called “sliding clamp.” Certain DNA polymerases “achieve high processivity by the attachment of their catalytic subunits to a ‘sliding DNA clamp’ . . . which are bound to DNA by virtue of their topology and have to be assembled on DNA by other proteins . . . ” (Krishna et al; J. Mol. Biol., 1994, 241, 265, incorporated herein by reference). The sliding clamp, which prevents the polymerase from falling off the DNA strand, has been reviewed by Jeruzalmi et al, Current Opinion in Structural Biology, 2002, 12, 217, incorporated herein by reference.
Allen has disclosed a method incorporating both a pore and AFM in U.S. Pat. No. 6,280,939. However, although the patent describes the need for a label as a disadvantage of previous techniques, it requires “flagging” the various nucleotides by introducing base-dependent conditions, such as time of incorporation.
Manalis has disclosed a sequencing method in U.S. patent application Ser. No. 2002 86,428 also using a polymerase, but with a so-called “single electron transistor” that measures the electric charge configuration around the polymerase, which is said to change depending upon the base passing through the polymerase.