Increasing the speed of polynucleotide sequencing is at present one of the most pressing problems in molecular biology. Although sequencing speed has increased many-fold due to advances in labeling and detection (e.g., Smith, 1985; Ansorge, 1986), current automatic sequencing machines employ essentially the same principles as originally proposed in 1977 (Maxam, 1977; Sanger, 1977).
In the method of Maxam and Gilbert, a terminally labeled oligonucleotide is cleaved internally, in four separate reaction mixtures under partial cleavage conditions, using chemical reagents which cleave at one or two defined base-types. The truncated reaction products are resolved on the basis of size, and the oligonucleotide sequence is determined from the order of elution of the fragments, taking into account the base-specificities of the cleavage reagents.
The method of Sanger, on the other hand, involves enzymatic extension of a 5'-primer along a target template strand in the presence of the four standard deoxynucleotide bases, plus one base in dideoxy form. Random incorporation of the selected dideoxynucleotide results in a mixture of products of variable length, each terminating at its 3'-end with the dideoxynucleotide. As originally proposed, four sequencing reactions were performed for a given target sequence, one for each dideoxynucleotide base-type. The products from each mixture were then resolved in four separate lanes on the basis of size, and the target sequence was determined in a manner similar to that used in the Maxam and Gilbert method. Variants were later developed which use spectrally resolvable fluorescent dyes attached to either the 5'-extension primer (Smith, 1985) or the 3'-dideoxy terminator bases (Prober, 1987; Bergot, 1991), allowing determination of the target sequence using a single separation path.
In 1988, Church et al. proposed a "multiplex" sequencing method by which multiple sequences could be determined after coelution of sequencing fragments from different targets in a single gel lane. The separated fragments are transferred to a membrane and then iteratively hybridized with different template probes to obtain sequence data, one sequence at a time. Unfortunately, this method requires time-consuming probing and washing steps and is not efficient for large scale sequencing projects.
As an alternative to the methods above, a "sequencing by hybridization" approach was proposed wherein groups of consecutive bases are determined simultaneously through hybridization of a target sequence with a complete set of all possible sequences of length k (k-tuples) (e.g., Bains, 1988; Macevicz, 1989). In one approach, a sample polynucleotide is hybridized to a set of all possible k-tuple oligonucleotides immobilized as an ordered array (Macevicz, 1989). The pattern of hybridization on the array allows the sequence to be determined, albeit only for short sequences. In a second approach, multiple sample polynucleotides are immobilized as an ordered array on a support and are hybridized sequentially with a series of k-tuples (Strezoska, 1991). With this method, however, an enormous number of probing steps is required before meaningful sequence information for any of the sample polynucleotides can be obtained. Moreover, both sequence by hybridization approaches are inefficient in terms of the number of k-tuple probes used, most of which do not bind to the sample.
In view of the inadequacies of the methods proposed to date, there is a need for new sequencing methods which are capable of providing sequencing data for a large number of target sequences. Ideally, the number of time-consuming or expensive steps will remain relatively constant or increase slowly with the number of templates. In addition, the method should be amenable to automation, so that the involvement of manual steps is reduced.