The basic techniques for sequencing DNA include the Maxam-Gilbert chemical-degradation method (Maxam, A. M. and Gilbert, W. (1977) Proc. Natl. Acad. Sci. U.S.A. 74:560) and the dideoxy termination method of Sanger (Sanger, et al. (1977) Proc. Natl. Acad. Sci. U.S.A. 74:5463-5467). In addition, improvements have been made in conventional electrophoretic and capillary gel-based sequencing methods and various proposals have been made to use multiplexed vectors (Church, et al. (1988) Science 240:185), fluorescent single molecule exonuclease digestion (Keller, et al. (1989) J. Bio. Molecular Structure and Dynamics 7:301), scanning tunneling microscopy (Beebe, et al. (1989) Science 243:370), laser X-ray diffraction (Human Geno News 2, No. 2, paged 4 (1990)), laser deadsorption mass spectrometry (Williams, et al. (1989) Science 246:1585), sequencing by hybridization (Khrapko, et al. (1989) FEBS Letters 256:118; Bains, et al. (1988) J. Theoro. Biol. 135:303; and Drmanac, et al. (1988) Genomics 4:114), array determination of DNA sequence by mass spectrometry (U.S. Pat. No. 5,003,059), as well as suggestions to use isotopic sulfur (Brennan, et al., Biological Mass Spectrometry, page 159, Editor A. L. Burlingame, Elsevier (New York 1990)), or metals (Jacobson, et al. (1991) Anal. Chem. 63:402) and chemiluminescent detection systems (Bronstein, et al. (1990) BioTechniques 8:310). However, none of the foregoing methodologies provide a comprehensive solution to the problem of large scale sequence analysis of genomes such as that of the human species.
In conventional nucleotide hybridization sequencing, the length of DNA which can be unambiguously reconstructed from a set of all complementary oligonucleotides with one-base offset is limited by the occurrence of repeating patterns of sub-sequence. With 8-mer probes, the average length of fragment which can be reconstructed unambiguously is about 185 bp (Drmanac, et al. (1989) Genomics 4:114).
Recently, it was proposed that hexamers could be used in sequential primer elongation by ligation (SPEL) as a method to construct specific members of the 12-, 18- and 24-mer library as walking primers (Szybalski (1990) Gene 90:177).
Finally, an oligonucleotide ligation assay (OLA) has been described for detecting point mutations wherein differentially labeled oligonucleotides that hybridge adjacent to each other in the proper orientation to a target sequence they can be covalently linked with T4 ligase (Landergn, et al. (1988) Science 241:1077 and U.S. Pat. No. 4,988,617).
Each of the foregoing sequencing methodologies has inherent limitations. For example, the laser X-ray diffraction, laser desorption and mass spectrometry methods require the use of expensive and sensitive instruments. The fluorescent single molecule exonuclease digestion method (also referred to as the Los Alamos approach) requires labeling at all nucleotide positions with a fluorescent dye to specifically identify that base. Yet, it has proved very difficult to induce the known DNA polymerases to incorporate multiple base specific fluorescent labels. A significant limitation in the Sanger or Maxam-Gilbert sequencing protocols is the length of DNA which can be sequenced. This length is limited experimentally by the ability to electrophoretically resolve labeled fragments differing in length from each other only by one nucleotide base. Although there have been reports of occasional read lengths in excess of 1,000 bp on long thin gels (Slighton, et al. (1991) Anal. Biochem. 192:441), the average practical limit is about 500-600 bp (Nishakawa, et al. (1991) Electrophoresis 12:623).
Given the foregoing limitations, it is apparent that simple and cost-effective methods are needed to determine the sequence of nucleic acids. Accordingly, it is an object herein to provide methods and compositions that provide information relating to the sequence or a sub-sequence of a nucleic acid spanning &gt;1 kb. Such methods and compositions are cost-effective and readily adapted to automation.