DNA sequencing is an important analytical technique critical to generating genetic information from biological organisms. The increasing availability of rapid and accurate DNA sequencing methods has made possible the determination of the DNA sequences of entire genomes, including the human genome. DNA sequencing has revolutionized the field of molecular biological research. In addition, DNA sequencing has become an important diagnostic tool in the clinic, where the rapid detection of a single DNA base change or a few base changes can be used to detect, for example, a genetic disease or cancer.
Most current methods of DNA sequencing are based on the method of Sanger (Proc. Natl. Acad. Sci U.S.A., 74, 5463 (1977)). This method relies on gel electrophoresis of single stranded nucleic acid fragments that are generated when a polymerization extension reaction of a primer is terminated by incorporation of a radioactively labeled dideoxynucleotide triphosphate. Short strands of DNA are synthesized under conditions that produce DNA fragments of variable length using a DNA polymerase and deoxynucleotide triphosphates (dNTP). A small amount of dideoxynucleotide triphosphates (ddNTP) is introduced into the DNA synthesis mixture so that chain terminating ddNTPs are sometimes integrated into a growing strand. Typically, four different extension reactions are performed side by side, each including a small amount of one ddNTP. Each extension reaction produces a mixture of DNA fragments of different lengths terminated by a known ddNTP. The ratio of ddNTPs to dNTPs is chosen so that the populations of DNA fragments in any given extension reaction includes fragments of all possible lengths (up to some maximum) terminating with the relevant ddNTP. The nucleic acid fragments are separated by length in the gel, typically utilizing a different lane in a polyacrylamide gel for each of the four terminating nucleotide bases being detected. However, such size exclusion chromatography is generally a low resolution method limited to reading short sequences.
A variation of this method utilizes dyes rather than radioactivity to label the ddNTPs. Different dyes are used to uniquely label each of the different ddNTPs (i.e., a different dye may be associated with each of A, G, C, and T termination) (Smith et al. and Prober et al. Science 238:336-341, 1987). In the method of Smith, fluorescent dyes are attached to the 3xe2x80x2 end of the dNTP converting it into a ddNTP. The use of four different dye labels allows the entire sequencing reaction to be conducted in a single reaction vessel and results in a more uniform signal response for the different DNA fragments. The dye-terminated dNTPs are also able to be electrophoresed in a single lane. The advent of capillary electrophoresis further increased the separation efficiency of this method, allowing shorter run times, longer reads, and higher sensitivity.
Despite these advances, DNA sequencing methods that rely on electrophoresis to resolve DNA fragments according to their size are limited by the rate of the electrophoresis and the number of bases that are detectable on the gel. In addition, real time imaging of the gel is not possible. Accordingly, in order to increase the speed and reliability of the sequencing reaction, great effort has been made to automate these steps. Automated DNA sequencing machines are now available that are capable of high throughput sequencing for both genomic sequencing and routine clinical applications. However, these newer techniques remain cumbersome, requiring specialized chemicals and the intensive labor of skilled technicians.
One newer method of DNA sequencing, xe2x80x9cpyrosequencingxe2x80x9d or xe2x80x9csequencing-by-synthesis,xe2x80x9d disclosed in WO 98/13523, is based on the concept of detecting inorganic pyrophosphate (PPi), which is released during a polymerase reaction. As in the Sanger method, a sequencing primer is hybridized to a single stranded DNA template and incubated with a DNA polymerase. In addition to the polymerase, the enzymes ATP sulfurylase, luciferase, and apyrase, and the substrates, adenine 5xe2x80x2 phosphosulfate (APS) and luciferin, are added to the reaction. Subsequently, individual nucleotides are added. When the added nucleotide is complementary to the next available base in the template strand, it is incorporated into the extension product. Such incorporation of a complementary base is accompanied by release of pyrophosphate (PPi), which is converted to ATP in the presence of adenosine 5xe2x80x2 phorphosulfate by apryase in a quantity equimolar to the amount of incorporated nucleotide. The ATP generated by the reaction with apyrase then drives the luciferase mediated conversion of luciferin to oxyluciferin, generating visible light in amounts that are proportional to the amount of ATP and thus the number of nucleotides incorporated into the growing DNA template. The light produced by the luciferase-catalyzed reaction is detected by a charge coupled device (CCD) camera and detected as a peak in a pyrogram(trademark).
In a pyrosequencing reaction, if the first nucleotide added to the reaction is not complementary to the next available nucleotide on the growing DNA strand there is no light generated. If no light is generated by the addition of the first nucleotide, a second of four dNTPs is added sequentially to the reaction to test whether it is the complementary nucleotide. This process is continued until a complementary nucleotide is added and detected by a positive light read-out. Whether or not a positive light reaction is generated, apyrase, a nucleotide-degrading enzyme, continuously degrades unincorporated dNTPs and excess ATP in the reaction mixture. When degradation is complete, another dNTP is added.
Although pyrosequencing is capable of generating high quality data in a relatively simple fashion, this method has several drawbacks. First, the productivity of the method is not high, reading only about 1 base per 100 seconds. The rate of the reaction is limited by the necessity of having to add new enzymes with each addition of the dNTPs in addition to the necessity of having to test each of the four dNTPs separately. In addition, it has been found that the dATP used in the chain extension reaction interferes in subsequent luciferase-based detection reactions by acting as a substrate for the luciferase enzyme. Finally, these reactions are expensive to run.
While pyrosequencing improves the ease and speed with which DNA sequencing is achieved, there exists the need for improved sequencing methods that allow more rapid detection. Preferred techniques would be amenable to automation and allow the sequence information to be revealed simultaneously with or shortly after the chain extension reaction.
The present invention provides a novel system for sequencing nucleic acid molecules. In particular, the invention utilizes dNTPs that are 3xe2x80x2 end labeled with a cleavable tag that distinguishes the dNTP from other dNTPs (e.g., the tag may be unique to the dNTP). The cleavable tags are functional groups that can be later removed by any appropriate means, including but not limited to, exposure to chemical cleavage conditions or light. dNTPs labeled with the cleavable tags function as terminated dNTPs (cdNTPs), in that their incorporation into a single stranded nucleic acid molecule via a primer extension reaction blocks further extension. However, removal of the tag converts the cdNTP back into an extendible nucleotide.
According to the present methods, a sequencing primer is hybridized to a nucleic acid template, e.g., a single stranded DNA template, and incubated with an enzyme (DNA polymerase) and four cdNTPs (tag terminated dATP (cdATP), dCTP (cdCTP), dGTP (cdGTP), and dTTP (cdTTP)). The DNA polymerase then extends the primer by adding to it whichever cdNTP is complementary to the next available base on the template strand. Only a single cdNTP is incorporated, because the cdNTP cannot be further extended.
After completion of a single base addition, unreacted (excess) cdNTPs are removed from the reaction mixture, which includes the extended primer, the DNA polymerase, and the single stranded DNA template. The step of removing can be accomplished by any of a variety of means that would be apparent to one skilled in the art. For example, if the reaction mixture is contained in a chamber that has an attached membrane (e.g., an ultrafiltration membrane that allows small molecules such as water, salts, and cdNTPs to pass through, but does not allow passage of large molecules such as single stranded DNA), the excess cdNTP can be washed through the membrane. Alternatively, if the single stranded DNA is attached to a solid support, the excess cdNTPs can be washed away from the single stranded DNA without dislodging the hybridized, extended primer.
Once the step of removing is complete, the tag is cleaved from the cdNTP that is extended into the single stranded DNA template. In certain embodiments, the cleavage occurs by photo-cleavage of the tag from the extended single stranded DNA template by exposure to light. Alternatively, in other preferred embodiments, the cleavage occurs by exposure of the single stranded DNA template to a chemical cleaving agent, e.g., an acid or a base. Whichever cleavage method is employed, the result is liberation of the 3xe2x80x2 end of the extension product for further extension.
The cleaved tag is then washed through the membrane into a detector for identification, thereby identifying the complementary base in the single stranded DNA template and determining the DNA sequence. The detector used to identify the tag is chosen based on the type of cleavable tag employed. Any of a variety of tags may be employed in the present invention, as would be recognized by the skilled artisan, and such tags are described herein. Once the tag is cleaved, the four cdNTPs are added back to the primer extension reaction mixture and the cycle of extension, tag cleavage, and identification is repeated.
In other preferred embodiments, short oligonucleotides are employed in a ligation reaction to determine the sequence of a particular DNA sample. The sequence of a DNA sample is determined by incorporating xe2x80x9cXxe2x80x9d complementary bases (e.g., 2 mers, 3 mers, or more) at a time onto the single stranded DNA template adjacent to a primer using a DNA ligase instead of using a DNA polymerase. Each oligonucleotide is tagged and labeled with a cleavable tag so that the position of each base in the sequence of the oligonucleotide can be identified. The tag further prevents ligation of the oligonucleotides to one another.
According to this aspect of the invention, a template DNA is exposed to the oligonucleotides, the oligonucleotides are allowed to hybridize to the template DNA, and a ligation reaction is allowed to take place on the DNA template such that one complementary oligonucleotide is incorporated onto the DNA template adjacent to the annealed primer. Following ligation, the unincorporated oligonucleotides are washed away from the DNA sample and the tags are cleaved and analyzed to determine the nucleic acid sequence.