DNA sequencing has become a vitally important technique in modern biology and biotechnology, providing information relevant to fields ranging from basic biological research to drug discovery to clinical medicine. Because of the large volume of DNA sequence data to be collected, automated techniques have been developed to increase the throughput and decrease the cost of DNA sequencing methods (Smith; Connell; Trainor).
A preferred automated DNA sequencing method is based on the enzymatic replication with chain termination technique developed by Sanger (Sanger). In Sanger's chain-termination technique, the DNA sequence of a single-stranded template DNA is determined using a DNA polymerase to synthesize a set of polynucleotide fragments wherein the fragments (i) have a sequence complementary to the template sequence, (ii) vary in length by a single nucleotide, and (iii) have a 5'-end terminating in a known nucleotide, e.g., A, C, G, or T. In the method, an oligonucleotide primer is annealed to a 3'-end of a template DNA to be sequenced, a 3'-end of the primer serving as an initiation site for polymerase mediated polymerization of a complementary polynucleotide fragment. The enzymatic polymerization step is carried out by combining the template-primer hybrid with each of the four 2'-deoxynucleotide-5'-triphosphate nucleotides, A, G, C, and T ("dNTPs"), a DNA polymerase enzyme, and a 2',3'-dideoxynucleotide triphosphate ("ddNTP") terminator. The incorporation of the terminator forms a fragment which lacks a hydroxy group at the 3'-terminus and thus can not be further extended by the polymerase, i.e., the fragment is "terminated". The competition between the ddNTP and its corresponding dNTP for incorporation results in a distribution of different-sized fragments, each fragment terminating with the particular terminator used in the reaction. To determine the complete DNA sequence of the template, four parallel reactions are run, each reaction using a different ddNTP terminator. To determine the size distribution of the fragments, the fragments are separated by electrophoresis such that fragments differing in size by a single nucleotide are resolved.
In a modern variant of the classical Sanger chain-termination technique, the nucleotide terminators, or the oligonucleotide primers, are labeled with fluorescent dyes (Prober; Hobbs; Smith). Several advantages are realized by utilizing such dye-labeled terminators, in particular: (i) problems associated with the storage, use and disposal of radioactive isotopes are eliminated; (ii) the requirement to synthesize dye-labeled primers is eliminated; and, (iii) when using a different dye label for each A,G,C, or T nucleotide, all four reactions can be performed simultaneously in a single tube.
While the Sanger chain-termination sequencing method has proven very effective, several problems remain with respect to optimizing its performance. One such problem, particularly when using dye-labeled terminators, is the sequence-dependent variability of the incorporation of labeled terminator into the primer extension products, particularly in the case of T-terminated fragments. This variability of incorporation leads to variable peak heights in the resulting electropherogram. Such peak height variability may lead to several problems. First, such variability decreases the sensitivity of the method, which is limited by the ability to detect the weakest peaks. Second, such variability creates difficulties in determining whether a peak having a weak signal is a true signal due to the incorporation of a chain-terminating agent, or an artifact due to a pause site in the DNA where the polymerase has dissociated. Third, such variations decrease the accuracy in determining the identity of closely spaced bands since the strong signal of one band may mask the weak signal of its neighbor. Each of these problems become particularly acute when automated base calling algorithms are applied to the data.