The ability to determine DNA sequences is crucial for understanding the function and control of genes and for applying many of the basic techniques of molecular biology. Native DNA consists of two linear polymers, or strands of nucleotides. Each strand is a chain of nucleosides linked by phosphodiester bonds. The two strands are held together in an antiparallel orientation by hydrogen bonds between complementary bases of the nucleotides of the two strands: deoxyadenosine (A) pairs with thymidine (T) and deoxyguanosine (G) pairs with deoxycytidine (C).
Presently there are two basic approaches to DNA sequence determination: the dideoxy chain termination method, e.g. Sanger et al, Proc. Natl. Acad. Sci., Vol. 74, pgs. 5463-5467 (1977); and the chemical degradation method, e.g. Maxam et al, Proc. Natl. Acad. Sci., Vol. 74, pgs. 560-564 (1977). The chain termination method has been improved in several ways, and serves as the basis for all currently available automated DNA sequencing machines, e.g. Sanger et al, J. Mol. Biol., Vol. 143, pgs. 161-178 (1980); Schreier et al, J. Mol. Biol., Vol. 129, pgs. 169-172 (1979); Smith et al, Nucleic Acids Research, Vol. 13, pgs. 2399-2412 (1985); Smith et al, Nature, Vol. 321, pgs. 674-679 (1987); Prober et al, Science, Vol. 238, pgs. 336-341 (1987), Section II, Meth. Enzymol., Vol. 155, pgs. 51-334 (1987), and Church et al, Science, Vol 240, pgs. 185-188 (1988).
Both the chain termination and chemical degradation methods require the generation of one or more sets of labeled DNA fragments, each having a common origin and each terminating with a known base. The set or sets of fragments must then be separated by size to obtain sequence information. In both methods, the DNA fragments are separated by high resolution gel electrophoresis. Unfortunately, this step severely limits the size of the DNA chain that can be sequenced at one time. Non-automated sequencing can accommodate a DNA chain of up to about 500 bases under optimal conditions, and automated sequencing can accommodate a chain of up to about 300 bases under optimal conditions, Bankier et al, Meth. Enzymol., Vol. 155, pgs. 51-93 (1987); Roberts, Science, Vol. 238, pgs. 271-273 (1987); and Smith et al, Biotechnology, Vol. 5, pgs. 933-939 (1987).
This limitation represents a major bottleneck for many important medical, scientific, and industrial projects aimed at unraveling the molecular structure of large regions of plant or animal genomes, such as the project to sequence all or major portions of the human genome, Smith et al, Biotechnology (cited above).
In addition to DNA sequencing, nucleic acid hybridization has also been a crucial element of many techniques in molecular biology, e.g. Hames et al, eds., Nucleic Acid Hybridization: A Practical Approach (IRL Press, Washington, D.C., 1985). In particular, hybridization techniques have been used to select rare cDNA or genomic clones from large libraries by way of mixed oligonucleotide probes, e.g. Wallace et al, Nucleic Acids Research, Vol. 6, pgs. 3543-3557 (1979), Proc. Natl. Acad. Sci., Vol. 80, pgs. 5842-5846 (1983). Nucleic acid hybridization has also been used to determine the degree of homology between sequences, e.g. Kafatos et al, Nucleic Acids Research, Vol. 7, pgs. 1541-1552 (1979), and to detect consensus sequences, e.g. Oliphant et al, Meth. Enzymol., Vol. 155, pgs. 568-582 (1987). Implicit to all of these applications is the notion that the known probe sequences contain information about the unknown target sequences. This notion apparently has never been exploited to obtain detailed sequence information about a target nucleic acid.
In view of the limitations of current DNA sequencing methods, it would be advantageous for the scientific and industrial communities to have available an alternative method for sequencing DNA which (1) did not require gel electrophoretic separation of similarly sized DNA fragments, (2) had the capability of providing the sequence of very long DNA chains in a single operation, and (3) was amenable to automation.