Nucleic acid sequencing has become a vitally important technique in modern biology and biotechnology, providing information relevant to fields ranging from basic biological research to drug discovery to clinical medicine. Because of the large volume of DNA sequence data to be collected, automated techniques have been developed to increase the throughput and decrease the cost of nucleic acid sequencing methods, e.g., U.S. Pat. No. 5,171,534; Connell et al., Biotechniques, 5(4): 342-348 (1987); and Trainor, Anal. Cletn., 62: 418-426 (1990).
A preferred automated nucleic acid sequencing method is based on the enzymatic replication technique developed by Sanger, et al., Proc. Natl. Acad. Sci., 74: 5463-5467 (1977). In Sanger's technique, the sequence of a single-stranded template nucleic acid is determined using a nucleic acid polymerase to synthesize a set of polynucleotide fragments wherein the fragments (i) have a sequence complementary to the nucleic acid sequence, (ii) differ in length by a single nucleotide, and (iii) have a 5'-end terminating in a known nucleotide, e.g., A, C, G, or T. In the method, an oligonucleotide primer is hybridized to a 3'-end of the template nucleic acid to be sequenced, the 3'-end of the primer serving as an initiation site for polymerase-mediated polymerization of a complementary polynucleotide fragment. The enzymatic polymerization step, or primer extension reaction, is carried out by combining the template-primer hybrid with the four extendible nucleotides, e.g., deoxynucleotides ("dNTPs"), a nucleic acid polymerase enzyme, and a nucleotide "terminator", e.g., 2', 3'-dideoxynucleotide triphosphate ("ddNTP"). The incorporation of the terminator forms a primer extension product which lacks a hydroxy group at the 3'-terminus and thus can not be further extended by the polymerase, i.e., the extension product is "terminated". The competition between the ddNTP and its corresponding terminator for incorporation results in a distribution of different-sized extension products, each extension product terminating with the particular terminator used in the reaction. To discover the complete sequence of the template nucleic acid, four parallel reactions are run, each reaction using a different terminator. To determine the size distribution of the extension products, the extension products are separated by electrophoresis such that products differing in size by a single nucleotide are resolved.
In a modern variant of the classical Sanger technique, each nucleotide terminator is labeled with a fluorescent dye, e.g., Prober et al., Science, 238: 336-341 (1987); and U.S. Pat. No. 5,151,507, and a thermostable DNA polymerase enzyme is used, e.g., Murray, Nucleic Acids Research, 17(21): 8889 (1989). Several advantages are realized by utilizing dye-labeled terminators, e.g., (i) problems associated with the storage, use and disposal of radioactive isotopes are eliminated, (ii) the requirement to synthesize dye-labeled primers is eliminated, and, (iii) when using a different dye label for each A,G,C, or T terminator, all four primer extension reactions can be performed simultaneously in a single tube. Using a thermostable polymerase enzyme provides several additional advantages, e.g., (i) the polymerization reaction can be run at elevated temperature thereby disrupting any secondary structure of the template, resulting in fewer sequence-dependent artifacts, and (ii) the sequencing reaction can be thermocycled, thereby serving to linearly amplify the amount of extension product produced, thus reducing the amount of template nucleic acid required to obtain a reliable sequence.
While these modem variants on Sanger sequencing methods have proven effective, several problems remain with respect to optimizing their performance and economy. One problem encountered when using presently available dye-labeled terminators in combination with thermostable polymerase enzymes in a Sanger-type nucleic acid sequencing process, particularly in the case of fluorescein-type dye labels, is that a large excess of dye-labeled terminator over the unlabeled extendible nucleotides is required, e.g., up to a ratio of 50:1. This large excess of labeled terminator makes it necessary to purify the sequencing reaction products prior to performing the electrophoretic separation step in order to avoid interference caused by the comigration of unincorporated labeled terminator species and bona fide labeled sequencing fragments. A typical clean-up method includes an ethanol precipitation or a chromatographic separation as described in ABI PRISM.TM. Dye Terminator Cycle Sequencing Core Kit Protocol, PE Applied Biosystems, Revision A, p/n 402116 (August 1995). Such a clean-up step greatly complicates the task of developing totally automated sequencing systems wherein the sequencing reaction products are transferred directly into an electrophoretic separation process.
A second problem encountered when using presently available dye-labeled terminators in combination with a thermostable polymerase in a Sanger-type nucleic acid sequencing process is that the extent of incorporation of labeled terminators into a primer extension product is variable and therefore results in an uneven distribution of peak heights when the primer extension products are separated by electrophoresis and detected using fluorescence detection. Such uneven peak heights are disadvantageous because they make automated sequence determination and heterozygote detection substantially less reliable.
Thus, there remains a continuing need for labeled nucleotide terminator compounds which do not require a large excess over unlabeled extendable nucleotides in a primer extension reaction and, which produce an even peak height distribution in a Sanger-type sequencing reaction.