Ordered arrays of oligonucleotides immobilized on a solid support have been proposed for sequencing DNA fragments. It has been recognized that hybridization of a cloned single-stranded DNA fragment to all possible oligonucleotide probes of a given length can identify the corresponding, complementary oligonucleotide segments that are present somewhere in the fragment, and that this information can sometimes be used to determine the DNA sequence. Use of arrays can greatly facilitate the surveying of a DNA fragment's oligonucleotide segments. There are two approaches currently being employed.
In one approach, each oligonucleotide probe is immobilized on a solid support at a different predetermined position, forming an array of oligonucleotides. The array allows one to simultaneously survey all the oligonucleotide segments in a DNA fragment strand. Many copies of the strand are required, of course. Ideally, surveying is carried out under conditions to ensure that only perfectly matched hybrids will form. Oligonucleotide segments present in the strand can be identified by determining those positions in the array where hybridization occurs. The nucleotide sequence of the DNA sometimes can be ascertained by ordering the identified oligonucleotide segments in an overlapping fashion. For every identified oligonucleotide segment, there must be another oligonucleotide segment whose sequence overlaps it by all but one nucleotide. The entire sequence of the DNA strand can be represented by a series of overlapping oligonucleotides, each of equal length, and each located one nucleotide further along the sequence. As long as every overlap is unique, all of the identified oligonucleotides can be assembled into a contiguous sequence block [Bains, W. and Smith, G. (1988). A Novel Method for Nucleic Acid Sequence Determination, J. Theor. Biol. 135, 303-307; Lysov, Yu. P., Florentiev, V. L., Khorlin, A. A., Khrapko, K. R., Shik, V. V. and Mirzabekov, A. D., (1988). Determination of the Nucleotide Sequence of DNA Using Hybridization to Oligonucleotides. A New Method, Doklady Akademii Nauk SSSR 303, 1508-1511]. The practical feasibility of using oligonucleotide arrays for sequencing nucleic acid fragments has been demonstrated in model experiments in which short synthetic DNA strands made of pyrimidines were hybridized to an array containing the 4,096 possible octapurines [Maskos, U. and Southern, E. M. (1991). Analyzing Nucleic Acids by Hybridization to Arrays of Oligonucleotides: Evaluation of Sequence Analysis, In Genome Mapping and Sequencing (Abstracts of papers presented at the 1991 meeting arranged by M. Olson, C. Cantor and R. Roberts), p. 143, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.].
An attractive feature of sequencing by oligonucleotide hybridization is its suitability for being automated. Another attractive feature is its tolerance of detection errors. There is an inherent redundancy in the data, due to the overlapping nature of the oligonucleotides. In contradistinction, current prevalent sequencing methods are based on the reading of sequences one nucleotide at a time, and it is common to overlook a legitimate nucleotide or to insert an illegitimate nucleotide. There is, however, an important limitation to sequencing by known surveying techniques. As relatively longer DNA strands are surveyed, there is an increasing probability that more than two identified oligonucleotides will share the same overlapping sequence, i.e., the overlap is not unique. When this occurs, the sequence of the DNA cannot be unambiguously determined. Instead of one contiguous sequence block that contains the entire DNA sequence, the oligonucleotides can only be assembled into a number of smaller sequence blocks, whose order is not known. Lysov et al. have estimated that, if oligonucleotide probes 8 nucleotides in length are used, then at least 20 percent of all random sequences merely 200 nucleotides in length can not be assembled into a single sequence block, because of the presence of non-unique overlaps. The longer the DNA sequence, the worse this problem becomes. Khrapko et al. suggested that the ambiguities in reconstruction of a DNA sequence caused by the presence of non-unique overlaps between surveyed oligonucleotides could be resolved by a secondary hybridization of the DNA-oligonucleotide complexes to a series of short oligonucleotides, so that the two hybrids would stack on each other, thus producing a longer duplex [Khrapko, K. R., Lysov, Yu. P., Khorlin, A. A., Shik, V. V., Florentiev, V. L. and Mirzabekov, A. D. (1989). An Oligonucleotide Hybridization Approach to DNA Sequencing, FEBS Lett. 256, 118-122].
Another way of using arrays for DNA sequencing has been proposed by Drmanac et al. In their method, many different cloned DNA strands are each bound to a solid support at a different position. All are then tested in parallel for their ability to form a hybrid with each of the possible oligonucleotides of a given length. One oligonucleotide at a time is tested. To resolve ambiguities arising because of the presence of non-unique overlaps between the oligonucleotides revealed in a DNA strand, it has been suggested that a library of densely overlapping cloned fragments be prepared and analyzed. The library would be composed of approximately 500-nucleotide-long DNA strands with a 40-nucleotide average displacement. [Drmanac, R., Labat, I., Brukner, I. and Crkvenjakov, R. (1989). Sequencing of Megabase Plus DNA by Hybridization: Theory of the Method, Genomics 4, 114-128]. The feasibility of this method has also been demonstrated [Strezoska, Z., Paunesky, T., Radosavljevic, D., Labat, I., Drmanac, R. and Crkvenjakov, R. (1991). DNA Sequencing by Hybridization: 100 Bases Read by a Non-gel Method, Proc. Natl. Acad. Sci U.S.A. 88, 10089-10093].
The sequencing techniques described above, as well as conventional sequencing techniques, rely on cloning the fragments to be sequenced. Cloning of DNA fragments is well known. For cloning, DNA fragments are ligated into cloning vectors (e.g., plasmids or bacteriophage DNAs), which are then introduced by means of transformation into microbial cells, where they are amplified. At appropriate ratios of fragment-to-vector and vector-to-cell, there will be only one fragment ligated into a vector molecule, and only one recombinant molecule introduced into each transformed cell. By obtaining progeny from individual transformed cells (clones) individual DNA fragments can be isolated. If a large DNA (e.g., a genome) were to be sequenced, it first would be cleaved into pieces of suitable size by, for example, digestion with a restriction endonuclease. The goal of the cloning procedure, in this case, is to obtain a comprehensive library of cloned fragments, which, taken together, comprise every segment of the DNA to be sequenced. However, the completion of a clone library is essentially an asymptotic process. Because fragment cloning is intrinsically random, the number of clones that have to be isolated and analyzed is much greater than the number of different restriction fragments produced by digestion of the original DNA [Sambrook, J., Fritsch, E. F. and Maniatis, T. (1989). Molecular Cloning: A Laboratory Manual; 2nd edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.]. Moreover, there is no way to know whether the library is comprehensive or not, until the sequenced fragments are finally assembled. The cloning of fragments of an entire genome is extremely slow and tedious.
Recently, in place of classic cloning techniques, individual DNA fragments have been amplified by the polymerase chain reaction (PCR). Briefly, this method is based on the hybridization of two oligodeoxynucleotide probes (primers) to DNA strands and the extension of these primers by incubation with DNA polymerase. The primers are intended to hybridize to unique locations within complementary strands of the same DNA molecule, and their growing 3' termini are directed towards each other, so that their extension results in the replication of the DNA region included between them. The DNA template and product strands are then melted apart at elevated temperature to allow the next round of replication, where both the product strand and the template strand serve as templates for additional replication. This process is repeated many times by cycling between the annealing and melting temperatures, resulting in exponential amplification of the target region [see for example, Mullis et al., U.S. Pat. Nos. 4,800,159 and 4,965,188], incorporated by reference herein. The advantage of PCR over cloning is that fragment isolation becomes deterministic, instead of being random. However, in order to use PCR for preparing DNA fragments, two unique oligonucleotide primers must be synthesized for every new fragment that is amplified. Moreover, the terminal sequences of each fragment must be known in advance. Thus latter circumstance makes PCR, in its current form, barely useful for the preparation of individual fragments of unknown nucleotide sequences.