Completion of the human genome has paved the way for important insights into biologic structure and function. Knowledge of the human genome has given rise to inquiry into individual differences, as well as differences within an individual, as the basis for differences in biological function and dysfunction. For example, single nucleotide differences between individuals, called single nucleotide polymorphisms (SNPs), are responsible for dramatic phenotypic differences. Those differences can be outward expressions of phenotype or can involve the likelihood that an individual will get a specific disease or how that individual will respond to treatment. Moreover, subtle genomic changes have been shown to be responsible for the manifestation of genetic diseases, such as cancer. A true understanding of the complexities in either normal or abnormal function will require large amounts of specific sequence information.
An understanding of cancer also requires an understanding of genomic sequence complexity. Cancer is a disease that is rooted in heterogeneous genomic instability. Most cancers develop from a series of genomic changes, some subtle and some significant, that occur in a small subpopulation of cells. Knowledge of the sequence variations that lead to cancer will lead to an understanding of the etiology of the disease, as well as ways to treat and prevent it. An essential first step in understanding genomic complexity is the ability to perform high-resolution sequencing.
Various approaches to nucleic acid sequencing exist. One conventional way to do bulk sequencing is by chain termination and gel separation, essentially as described by Sanger et al., Proc Natl Acad Sci U S A, 74(12): 5463–67 (1977). That method relies on the generation of a mixed population of nucleic acid fragments representing terminations at each base in a sequence. The fragments are then run on an electrophoretic gel and the sequence is revealed by the order of fragments in the gel. Another conventional bulk sequencing method relies on chemical degradation of nucleic acid fragments. See, Maxam et al., Proc. Natl. Acad. Sci., 74: 560–564 (1977). Finally, methods have been developed based upon sequencing by hybridization. See, e.g., Drmanac, et al., Nature Biotech., 16: 54–58 (1998). Bulk techniques, such as those described above, cannot effectively detect single nucleotide differences between samples, and are not useful for comparative whole genome sequencing. Single molecule techniques are necessary for high-resolution detection of sequence differences.
There have been several recent reports of sequencing using single molecule techniques. Most conventional techniques have proposed incorporation of fluorescently-labeled nucleotides in a template-dependent manner. A fundamental problem with conventional single molecule techniques is that the sequencing reactions are run to completion. For purposes of single molecule chemistry, this typically means that template is exposed to nucleotides for incorporation for about 10 half lives. This gives rise to problems in the ability to resolve single nucleotides as they incorporate in the growing primer strand. The resolution problem becomes extreme in the situation in which the template comprises a homopolymer region. Such a region is a continuous sequence consisting of the same nucleotide species. When optical signaling is used as the detection means, conventional optics are able to reliably distinguish one from two identical bases, and sometimes two from three, but rarely more than three. Thus, single molecule sequencing using fluorescent labels in a homopolymer region typically results in a signal that does not allow accurate determination of the number of bases in the region.
One method that has been developed in order to address the homopolymer issue provides for the use of nucleotide analogues that have a modification at the 3′ carbon of the sugar that reversibly blocks the hydroxyl group at that position. The added nucleotide is detected by virtue of a label that has been incorporated into the 3′ blocking group. Following detection, the blocking group is cleaved, typically, by photochemical means to expose a free hydroxyl group that is available for base addition during the next cycle.
However, techniques utilizing 3′ blocking are prone to errors and inefficiencies. For example, those methods require excessive reagents, including numerous primers complementary to at least a portion of the target nucleic acids and differentially-labeled nucleotide analogues. They also require additional steps, such as cleaving the blocking group and differentiating between the various nucleotide analogues incorporated into the primer. As such, those methods have only limited usefulness.
Need therefore exists for more effective and efficient methods and devices for single molecule nucleic acid sequencing.