Nucleic acid sequencing techniques are of major importance in a wide variety of fields ranging from basic research to clinical diagnosis. The results available from such technologies can include information of varying degrees of specificity. For example, useful information can consist of determining whether a particular polynucleotide differs in sequence from a reference polynucleotide, confirming the presence of a particular polynucleotide sequence in a sample, determining partial sequence information such as the identity of one or more nucleotides within a polynucleotide, determining the identity and order of nucleotides within a polynucleotide, etc.
DNA strands are typically polymers composed of four types of subunits, namely deoxyribonucleotides containing the bases adenine (A), cytosine (C), guanine (G), and thymidine (T). These subunits are attached to one another by covalent phosphodiester bonds that link the 5′ carbon of one deoxyribose group to the 3′ carbon of the following group. Most naturally occurring DNA consists of two such strands, which are aligned in an antiparallel orientation and are held together by hydrogen bonds formed between complementary bases, i.e., between A and T and between G and C.
DNA sequencing first became possible on a large scale with the development of the chain termination or dideoxynucleotide method (Sanger, et al., Proc. Natl. Acad. Sci. 74:5463-5467, 1977) and the chemical degradation method (Maxam & Gilbert, Proc. Natl. Acad. Sci. 74:560-564, 1977), of which the former has been most extensively employed, improved upon, and automated. In particular, the use of fluorescently labeled chain terminators was of key importance in the development of automatic DNA sequencers. Common to both of the above approaches is the production of one or more collections of labeled DNA fragments of differing sizes, which must then be separated on the basis of length to determine the identity of the nucleotide at the 3′ end of the fragment (in the chain termination method) or the identity of the nucleotide that was most recently removed from the fragment (in the case of the chemical degradation method).
Although currently available sequencing technologies have allowed the achievement of major landmarks such as the sequencing of a number of complete genomes, these techniques have a number of disadvantages, and considerable need for improvement remains in a number of areas. Separation of labeled DNA fragments has typically been achieved using polyacrylamide gel electrophoresis. However, this step has proven to be a major bottleneck limiting both the speed and accuracy of sequencing in many contexts. While capillary electrophoresis (CAE) proved to be the breakthrough that allowed the completion of the Human Genome Project (Venter, et al., Science, 291:1304-1351, 2001; Lander, et al., Nature, 409:860-921, 2001), significant shortcomings remain. For example, CAE still requires a time-consuming separation step and still involves discrimination based on size, which can be inaccurate.
A variety of alternatives to the chain termination method have been proposed. In one approach, often referred to as “sequencing by synthesis”, an oligonucleotide primer is first hybridized to a target template. The primer is then extended by successive cycles of polymerase-catalyzed addition of differently labeled nucleotides, whose incorporation into the growing strand is detected. The identity of the label serves to identify the complementary nucleotide in the template. Alternately, multiple reactions can be performed in parallel using each of the nucleotides, and incorporation of a labeled nucleotide in the reaction that uses a particular nucleotide identifies the complementary nucleotide in the template. (See, e.g., Melamede, U.S. Pat. No. 4,863,849; Cheeseman, U.S. Pat. No. 5,302,509, Tsien et al, International application WO 91/06678; Rosenthal et al, International application WO 93/21340; Canard et al, Gene, 148: 1-6 (1994); Metzker et al, Nucleic Acids Research, 22: 4259-4267 (1994)).
To efficiently sequence polynucleotides of any significant length, it is desirable that the polymerase incorporates exactly one nucleotide in each cycle. Therefore it is generally necessary to use nucleotides that act as chain terminators, i.e., their incorporation prevents further extension by the polymerase. The incorporated nucleotide must then be modified, either enzymatically or chemically, to allow the polymerase to incorporate the next nucleotide. A variety of nucleotide analogs that can serve as chain terminators but can be modified after their incorporation such that they can be extended in a subsequent step have been proposed. Such “reversible terminators” have been described, for example, in U.S. Pat. Nos. 5,302,509; 6,255,475; 6,309,836; 6,613,513. However, it has proven difficult to identify reversible terminators that can be incorporated by polymerase with high efficiency, probably due to the fact that given the small size of a nucleotide, modifications that affect the ability of the nucleotide to act as a terminator also affect its incorporation into a growing polynucleotide strand.
Other sequencing approaches include pyrosequencing, which is based on the detection of the pyrophosphate (PPi) that is released during DNA polymerization (see, e.g., U.S. Pat. Nos. 6,210,891 and 6,258,568. While avoiding the need for electrophoretic separation, pyrosequencing suffers from a large number of drawbacks that have as yet limited its widespread applicability (Franca, et al., Quarterly Reviews of Biophysics, 35(2): 169-200, 2002). Sequencing by hybridization has also been proposed as an alternative (U.S. Pat. No. 5,202,231; WO 99/60170; WO 00/56937; Drmanac, et al., Advances in Biochemical Engineering/Biotechnology, 77:76-101, 2002) but has a number of disadvantages including the potential for error in discriminating between highly similar sequences. Single-molecule sequencing by exonuclease, which involves labeling every base in one strand and then detecting sequentially cleaved 3′ terminal nucleotides in a sample stream is theoretically a very powerful method for rapidly determining the sequence of a long DNA molecule (Stephan, et al., J. Biotechnol., 86:255-267, 2001). However, various technical hurdles remain to be overcome before realization of this potential (Stephan, et al., 2001).
Diagnostic tests based upon particular sequence variations are already in use for a variety of different diseases. The sequencing of the human genome is widely thought to herald an era of personalized medicine in which therapies, including preventive therapies, will be tailored to the particular genetic make-up of the patient or will be selected based upon the identification of particular alleles or mutations. There is an increasing need for rapid and accurate determination of sequence variants of pathogenic agents such as HIV. Thus it is evident that the demand for accurate and rapid sequence determination will expand greatly in the immediate future. Improved methods for sequence determination of all types are therefore needed.