The development of dependable methods for sequence analysis of DNA (deoxyribonucleic acid) has lead to the success of recombinant DNA methodologies and the field of genetic engineering. DNA sequencing is generally accomplished by the method of Sanger, et al. (Proc. Natl. Acad. Sci. USA 74:5463-5467, 1977) and involves the in vitro enzymatic synthesis of single-stranded DNA starting from either a single- or double-stranded DNA template. In the original embodiment of the protocol, a primer, usually a synthetic oligonucleotide 15 to 30 bases in length, is first annealed to its complementary sequence on the template of the single-stranded DNA to be sequenced. The 3'-end of this primer is extended by the Klenow fragment of E. coli DNA polymerase I in the presence of 2'-deoxynucleoside 5'-triphosphates (dNTPs), one of which contains a radiolabel.
Four separate sequencing reactions are performed, each buffered reaction containing all four dNTPs (2'-deoxyadenosine 5'-triphosphate (dATP), 2'-deoxycytidine 5'-triphosphate (dCTP), 2'-deoxyguanosine 5'-triphosphate (dGTP), and 2'-deoxythymidine 5'-triphosphate (dTTP), and a small amount of one specific 2', 3'-dideoxynucleoside 5'-triphosphate chain-terminating agent (either ddATP, ddCTP, ddGTP, or ddTTP; or, in general, ddNTP).
By varying the ratio of the specific chain-terminating ddNTP to its dNTP analog in a particular reaction, the polymerase generates a population of fragments where a specific ddNTP is substituted at every possible position along the DNA template where the corresponding dNTP would have been incorporated. Once the one-step labelling and termination step has been completed, an excess of all four dNTPs are added to each reaction to "chase" all fragments not terminated by a specific ddNTP into higher molecular weight DNA.
The products of the four separate reactions are then fractionated and visualized in adjacent lanes on a high resolution denaturing polyacrylamide gel system.
In 1987 Tabor and Richardson (Tabor, S. and C. C. Richardson, Proc. Natl. Acad. Sci. USA 84:4767-4771, 1987) described a modification of the basic Sanger protocol for use with T7 DNA polymerase which separated the labelling from the termination step, or a two-step sequencing protocol. T7 DNA polymerase and a limiting amount of all four dNTPs, one of which was radiolabeled, were added to an annealed template and primer. During a short incubation step at a suboptimal polymerization temperature (e.g., room temperature) the polymerase added one to several hundred dNTPs to the 3'-end of the primer, while also incorporating the radiolabeled dNTP in all of the extended fragments. At the end of the labelling step, the mixture was allocated equally into four separate termination reactions. Each termination reaction contained nonlimiting concentrations of all four dNTPs and one specific ddNTP.
Following a second short incubation step at the optimal polymerization temperature for the DNA polymerase (DNAP) (e.g., 37.degree. C.), detection of the DNA fragments was as outlined for the Sanger protocol. The final process in both of the radiolabeled sequencing protocols described above included reading the autoradiogram to generate an ordered DNA sequence and then manual entry of this sequence into a data base for subsequent manipulations.
In 1989 Murray (Murray, V., Nucl. Acids Res. 17:8889, 1989) described a novel method for sequence generation from DNA templates using ddNTP termination of the DNA fragments. Murray applied a variation of the polymerase chain reaction (Mullis, K. B., et al., Cold Spring Harbor Symp. Quant. Biol. 51:263-273, 1986; Saike, R. K., et al., Science 230:1350-1354, 1985) which has become known as "cycle sequencing". Cycle sequencing has the advantage of using smaller amounts of template DNA than those sequencing methods described previously.
Since the mid-1980's commercially available DNA sequencing instruments have automated the gel electrophoresis, data collection, sequence generation and data entry steps involved with the radiolabeled methods described above. In addition, particular automated instruments have taken advantage of certain dyes that emit photon energy when excited with a laser, eliminating the need to use radioactivity to detect the separated DNA fragments. All of the instruments incorporate a high resolution polyacrylamide gel system for separation of the labelled DNA fragments. Each instrument also contains some form of detection system at a fixed point across the length of the gel near its bottom to detect the fluorescent-labelled fragments as they migrate during electrophoresis.
There are at present commercially available automated instruments based upon the detection technologies of: (1) single fluorescent-labelled primers or dNTPs with the sequencing reactions run and detected in separate lanes of a gel (Ansorge, W., et al., Nucl. Acids Res. 15:4593-4602, 1987), (2) primers labelled with four separate fluors (Smith, L., et al., Nucl. Acids Res. 13:2399-2412, 1985; Smith, L., et al., Nature 321:674-679, 1986) allowing all four reactions to be run and detected in one lane on a gel, or (3) the same strategy as in (2), above, except with the substitution of four different fluorescent-labelled ddNTPs for the labelled primers (Prober, J., et al., Science 238:336-341, 1987).
One problem encountered by all sequencing methodologies is sequence compression caused by DNA secondary structure during electrophoresis. The relatively short DNA fragments produced in the sequencing reaction fold back upon themselves, forming tight interstrand loops and hairpin turns. Some of these structures have sufficient strength that they are not completely denatured by heating or electrophoresis through 7-8M urea. The incompletely denatured fragments, which may result from either A:T or G:C base pairs, migrate faster through the gel matrix than surrounding fragments of similar length, causing fragments that should differ by one to a few nucleotides to comigrate and appear as overlapping peaks (see FIG. 1, arrow). Unambiguous sequence determination is impossible in areas where compression artifacts occur. One typical way to resolve the sequence in an area of a compression is to sequence the opposite DNA strand (Davies, R. W., Gel Electrophoresis of Nucleic Acids, A Practical Approach IRL Press, pp. 148-149, 1985).
Due to the cost involved with opposite strand sequencing various other strategies have evolved to combat compressions. These alternate strategies include running the sequencing gels at elevated temperatures and/or substituting modified dNTPs into the sequencing mixes. Two of these modified dNTPs usually are dATP and dGTP where the nitrogen molecule at position 7 on the base moiety has been changed to a carbon atom (c.sup.7 dATP and c.sup.7 dGTP, respectively). Both c.sup.7 dATP and c.sup.7 dGTP decrease by one the number of hydrogen bonds formed during base pairing of these molecules with their respective dNTP counterparts. The overall effect is to lower the melting temperature between DNA strands where the c.sup.7 dNTPs are incorporated allowing more efficient strand denaturation, and thus decreasing the number of compressions affecting the sequence data. One drawback with this method is the high cost of these modified molecules.
Another tactic used in an attempt to lower the melting temperature between DNA strands is the use of 2'-deoxyinosine-5'-triphosphate (dITP) in place of dGTP. However, dITP is not an ideal substrate for DNA polymerases and the results can be strong stops throughout the data, i.e., where one peak should appear from a fragment of specific size, three or more may be found co-migrating together. (See Barr P. J. et al., Bio Techniques 4(5):428-32, 1986.)
Needed in the art of DNA sequence analysis is an improved method that helps avoid DNA secondary structure artifacts during electrophoresis.