The dideoxy chain termination method of sequencing DNA is the basis for most of the DNA sequencing methods employed today, and has widespread use in all automated PCR cycle sequencing methods, instruments and systems (Sanger et al., 1977, Proc. Natl. Acad. Sci U.S.A., 74: 5463). This method relies on gel electrophoresis of a population of variable length single stranded nucleic acid fragments that are generated when oligonucleotide primers hybridized to the target nucleic acid template are extended by the polymerase-driven incorporation of deoxynucleotide triphosphates (dNTPs), and variably terminated by the incorporation of labeled dideoxynucleotide triphosphates (ddNTP). The incorporation of the chain-terminating ddNTPs ideally terminates the extension reaction at all possible base positions, thereby resulting in DNA fragments of all possible lengths, which can then be analyzed electrophoretically to generate a contiguous sequence of bases corresponding to the template.
The chain termination method has been modified in several ways, and serves as the basis for currently available automated DNA sequencing methods. See, e.g., Sanger et al., J. Mol. Biol., 143:161–78 (1980); Schreier et al., J. Mol. Biol., 129:169–72 (1979); Smith et al., Nucleic Acids Research, 13:2399–2412 (1985); Smith et al., Nature, 321:674–79 (1987), U.S. Pat. No. 5,171,534; Prober et al., Science, 238:33641 (1987); Section II, Meth. Enzymol., 155:51–334 (1987); Church et al., Science, 240:185–88 (1988); Swerdlow et al., Nucleic Acids Research, 18: 1415–19 (1989); Ruiz-Martinez et al., Anal. Chem., 2851–58 (1993); Studier, PNAS, 86:6917–21 (1989); Kieleczawa et. al., Science, 258:1787–91; and Connell et al., Biotechniques, 5:342–348 (1987).
Although the Sanger method was originally performed using radiolabeled fragments which were detected by autoradiography after separation, modern automated DNA sequencers generally are designed for fluorescently labeled fragments, which are detected in real time as they migrate past a detector. Additionally, although the Sanger method was initially conducted with four separate polymerase extension reactions, automated DNA sequencing systems either run these four reactions together or pool separate reactions prior to electrophoresis.
As an example, U.S. Pat. No. 5,171,534 describes a variation of this basic sequencing procedure in which four different fluorescent labels are employed, one for each sequencing reaction. The fragments developed in the A, G, C and T sequencing reactions are then recombined and introduced together onto a separation matrix. A system of optical filters is used to individually detect the fluorophores as they pass the detector. This allows the throughput of a sequencing apparatus to be increased by a factor of four, since the four sequencing reaction which were previously run in four separate lanes or capillaries can now be run in one.
Automated fluorescent DNA sequencing systems utilize either a “dye-primer” method (a variation of the Maxam-Gilbert method (Maxam et al., 1977, Proc. Natl. Acad. Sci. USA, 74:560–564) or a “dye-terminator” method (a variation of the basic Sanger method). The dye-primer method involves the use of a fluorescently-labeled primer in combination with unlabeled ddNTPs. The procedure requires four synthesis reactions and up to four lanes on a gel for each template sequenced (i.e., one lane for each of the base-specific termination products). Following extension of the fluorescently-labeled primer, the sequencing reaction mixtures containing ddNTP termination products are separated electrophoretically. The size-separated, fluorescently-labeled products are automatically scanned with a laser at the bottom of the electrophoretic gel or capillary, and fluorescence is detected with an appropriate monitor (Smith et al., 1986, Nature 321:674–679). In a modification of this method, the primer added to each of the four reactions is labeled with a different fluorescent marker. After the four separate sequencing reactions are completed, the reactions are combined and the mixture is subjected to analysis in a single gel lane or capillary. The different fluorescent labels (one corresponding to each of the four different base-specific termination products) are then individually detected.
The dye-terminator sequencing method utilizes a DNA polymerase to incorporate dNTPs onto the growing end of an unlabeled DNA primer until the enzyme incorporates a chain-terminating, fluorescently-labeled ddNTP (Lee et al., 1992, Nucleic Acid Research 20:2471). The dye-terminator method offers the advantage of not having to synthesize dye-labeled primers. Additionally, each different ddNTP is typically labeled with a different fluorescent marker, permitting all four reactions to be performed simultaneously in a single reaction vessel. This method, for example, is the basis of the various dye-terminator cycle sequencing kits marketed by Applied Biosystems Inc. (Foster City, Calif.).
Automated DNA sequencing methods utilize either dye-primer or dye-terminator methods in combination with thermostable polymerases and PCR cycling (see, e.g., U.S. Pat. No. 5,075,216). Cycle sequencing is a PCR based system involving repeated cycles of heating and cooling, wherein numerous extension products are generated from template DNA by a thermostable polymerase, such as Taq polymerase (Murray, 1989, Nucleic Acids Research 17:8889).
One of the advantages of cycle sequencing is that the high extension temperature discourages the formation of secondary structures on the template. However, certain templates, such as GC-rich sequences, may nevertheless form secondary structures through with DNA polymerases can not read. In dye-terminator sequencing, extension products are labeled only when a dye-labeled dideoxynucleotide terminator is incorporated. If the polymerase falls off the template strand because it has encountered an impassible secondary structure and no dye-labeled terminator is incorporated, the extension fragment created cannot be detected. Similarly, in dye-primer sequencing, if the polymerase dissociates from a partially extended fragment without incorporating a dideoxy terminator, a false stop is generated.
Throughout the scientific literature relating to the sequencing of the human and other genomes, reference is made to extraordinarily difficult and challenging regions for which reliable sequence information could not be obtained. The existence of these regions has impeded the closure of gaps and the final finishing of sequencing projects worldwide, and has fueled the development of a number of improvements in sequencing chemistries, software, and methods aimed at solving the problems presented by these difficult regions. Researchers faced with resolving these difficult regions have applied a variety of techniques, including resequencing, multiplexed PCR, searching for ESTs which overlap contig ends for designing new primers, shatter cloning, and transposon insertion or “bombing” methods.
However, notwithstanding the availability and implementation of these various techniques, the difficulties associated with sequencing certain types of DNA sequences persist. This appears to be especially true for “GC-rich” sequences, for which no universally reliable sequencing solution has emerged. Similarly, certain repeat structures, such as “CCT” repeats continue to confound the available DNA sequencing chemistries. Indeed, the ability to generate sequence data from GC-rich and CCT repeat regions has been an almost insurmountable problem faced by scientists working on the Human Genome Project for years. These GC-rich and CCT repeat regions are also believed to contain coding information crucial to the transcription of genes. Thus, in order to produce accurate and fully finished sequences, new sequencing methods and chemistries are needed to deal with regions that are refractory to standard sequencing methods.
A number of commercially available sequencing chemistries are in widespread use, with those provided by Applied Biosystems Inc. (ABI) being among the most popular. ABI has recently introduced refined DNA sequencing chemistries, such as BigDye® Terminator v. 1.1 and 3.1. To resolve particularly refractory sequence regions, ABI offers a dGTP based sequencing chemistry for use with difficult templates, particularly for templates with high GC content, as well as for templates with certain sequences or patterns. A further enhancement of the dGTP sequencing chemistry utilizes 7-deaza-dGTP. The use of 7-deaza-dGTP is intended to overcome compression problems typically encountered in sequencing GC-rich regions. While these enhanced chemistries represent an improvement over previous systems, they have not been able to produce long, quality read length sequence data in all cases, particularly where GC-rich sequences are involved.
Approaches recommended by automated cycle sequencing kit and instrumentation providers (e.g., Applied Biosystems Inc.) for sequencing GC-rich templates include increasing the DNA denaturing temperature to 98° C.; adding DMSO to the reaction mixture at a concentration of 5%; incubating the reaction mixture at 96° C. for 10 minutes before cycling; adding betaine to a concentration of 1M; doubling reaction components and incubating at 98° C. for 10 minutes before cycling; adding 5–10% formamide or 5–10% glycerol to the reaction mixture; linearizing plasmids before sequencing; shearing the DNA insert into smaller fragments and subcloning; and PCR amplifying the template DNA with the substitution of 7-deaza-dGTP for 75% of the dGTP used in the PCR reaction and then sequencing the PCR product (see, for example, Burgett et al., 1994, In: Automated DNA Sequencing and Analysis, ed. Adams et al., Academic Press, San Diego, Calif., pp. 211–215; Landre et al., 1995, In: PCR Strategies, ed. Innis et al., Academic Press, San Diego, Calif., pp. 3–16; Henke et al., 1997, Nucleic Acids Res. 25:3957–3958; Baskaran et al., 1996, Genome Res. 6: 633–638; Innis, 1990, In: PCR Protocols: A Guide to Methods and Applications, ed. Innis et al., Academic Press, San Diego, Calif., pp. 54–59; Fernandez-Rachubinski et al., 1990, DNA Seq. 1: 137–140).
Different dye-terminator chemistries are also offered for difficult sequences, including GC-rich sequences, and include chemistries which utilize dRhodamine terminators (e.g., dGTP Big Dye kits, Applied Biosystems Inc., Foster City, Calif.). See also, “Automated DNA Sequencing, Chemistry Guide (Applied Biosystems Inc., 2000).
Additionally, a number of thermostable polymerases and mutated thermostable polymerases having better GC-rich template read-through properties have been described. Generally, these polymerases are variants of the well known Taq polymerase. An examples of such a polymerase is the HotStarTaq DNA polymerase marketed by Qiagen (Valencia, Calif.),
However, the above methods are frequently not successful, and may also introduce additional problems. For example, where DMSO is added to the reaction mix, too much can impair the performance of the polymerase.
Notwithstanding the development of various sequencing chemistries and systems, there remains a strong need for new sequencing methodologies which are capable of generating reliable sequence data from templates having high GC content, CCT repeat elements, and the like. It would be most desirable for such new sequencing methods to be readily applicable to the now widely used automated cycle sequencing systems.