Until relatively recently, DNA sequencing was a very time-consuming and tedious task. Techniques which greatly simplify DNA sequencing have been reported, including the method of Sanger, F. et al., Proc. Natl Acad. Sci. U.S.A. 74, 5463-5467 (1977); Maxam, A. M. & Gilbert, W., Proc. Natl. Acad. Sci. U.S.A. 74, 560-564 (1977); and J. Maat & A. J. Smith, Nucleic Acid Res. 5, 4537-4545, (1978).
The method described in Sanger et al., supra. involves the synthesis of a radioactive complementary copy of a single-stranded target sequence with DNA polymerase, using the directly adjacent annealed strand of a restriction fragment as a primer. Using four separate reaction mixtures, which each contain a different dideoxynucleotide triphosphate (ddNTP) terminator and, in addition, the four deoxynucleoside triphosphates (dNTPs) (one or more of which are alpha-.sup.32 P-labelled), there is a partial incorporation of the terminator into the radioactive elongation products. In a single reaction a size range of labelled oligonucleotides is produced, all with a common 5' end, but with the 3' end terminating at the various sites, throughout the sequence, of the nucleotide to which the terminating ddNTP is analogous. Parallel fractionation by denaturing polyacrylamide gel electrophoresis of the products of the four separate reactions, which each contain, in turn, one of the four ddNTP chain terminators, resolves the oligonucleotides of different sizes. These reveal, in order, the positions of each base, allowing a sequence to be determined.
In the protocol of Maat et al., supra. a 5' .sup.32 P-labelled fragment is incubated in the presence of DNA polymerase I and pancreatic DNAase I in four separate reactions. Each reaction contains all four dNTPs and, in turn, one different ddNTP. The pancreatic DNAase I introduces nicks, the 3' hydroxyls of which serve as priming points for DNA polymerase I. Chain extension then proceeds from the 3' end of every nick, leading to a base-specific chain termination in each reaction. The products of the reactions are fractionated on denaturing polyacrylamide slab gels, resolving the oligonucleotides with the common labelled 5' end.
Although the nicking and subsequent chain extension occurs on both strands, because only one is labelled, a pattern of radioactive bands is produced for only that strand, allowing its sequence to be deduced. Nicking of the fragment by DNAase I does not occur at every residue along its length, but the fact that both deoxy- and dideoxy- NTPs are present in each reaction mixture ensures that chain extension continues through several residues of the same base from the site of the nick. Every residue in the sequence therefore gives rise to a band.
In both Sanger et al. and Maat et al. variations of chain elongation with terminators, four lanes are required to deduce the nucleotide sequence. Each lane defines the position of one of the four nucleotides with the position being determined relative to the bands in each of the other three lanes. All four lanes must be compared at once, and frequently, mental errors occur when reading across the four-lane pattern. Reading of the four-lane pattern may be quite difficult if the gel is bowed, as is frequently the case, or if other gel artifacts occur.
The method of Maxam and Gilbert supra. relies upon chemical reactions which cleave single chain DNA at nucleotide-specific (to the particular base) sites and also upon the fact that cleaved nucleotide chains of different lengths migrate at different rates on polyacrylamide gels. A single chain DNA sequence is tagged, e.g., radioactively, at one end. Then, various aliquots of the tagged chain are subjected to separate reactions mixtures which each cause, on the average, a single cleavage of the chain and at a specific nucleotide or nucleotides, e.g., two of the four. The several aliquots of fragmented chains are then run on side-by-side lanes on a polyacrylamide gel. With a chain of "n" nucleotides (n generally being no greater than 200), the several lanes together provide bands at n distances from the origin; however, as described in Maxam and Gilbert, each lane has only those bands corresponding to the specific cleavage site or sites. The nucleotide corresponding to each band distance from the origin is deduced from the band pattern across the gel, that is, in the direction normal to the migration direction.
Maxam and Gilbert have proposed a four-lane system in which each lane represents a fragmentation at one or two specific nucleotide sites. In one particular embodiment of the Maxam and Gilbert method, the four lanes represent (1) A(adenine) +G (guanine), (2) G (3) C (cytosine) and (4) C 30 T(thymine). In this system, a single band in lane 1 is conclusive of A at the cleavage site; side-by-side bands in lanes 1 and 2 are conclusive of G at the cleavage site, a single band in lane 4 is conclusive of T in this position and side-by-side bands in lanes 3 and 4 are conclusive of C at the cleavage site.
The results are unambiguous, providing that the gel is sufficiently clear; however, as practitioners in the art are well aware, gel resolution is often not of the quality that is desired. Problems with clarity include compressions, artifacts, pile-ups, and ghost bands. Furthermore, the bands in the different lanes do not generally align as straight across as might be desired, frequently being rather bowed. Because of this, reading of gels requires some experience and is still subject to error.
Several techniques are employed to reduce error. One of these is redundancy. Maxam and Gilbert recognize that their four lane system contains more information than is necessary to deduce the sequence and that a three-lane system with each lane representing a single nucleotide cleavage site would provide sufficient information to deduce the sequence; however, they strongly suggest that such redundancy is required to give reliable results. Reading of three lanes exhibiting cleavage at a single site each (or termination at a single site each in the chain elongation procedures, described above) would require judgments of spacing to deduce sites of the fourth nucleotide, which is complicated by the fact that the spacing between bands varies according to the terminal nucleotide. Another way of reducing error is to sequence both the encoding chain and its complementary strand.
While the above-described systems have worked relatively well, greatly reducing the time needed for nucleotide suquencing, there is a demand for even more precise and rapid sequencing. Because the genetic code comprises an arrangement of four nucleotides, and corresponding band patterns across an electrophoretic gel system are, in principle, simple to interpret, the systems lend themselves to automatic sequencing, including interpretation by computers or microprocessors. Interpretation by microprocessors, in addition to saving time, can also be expected to reduce technician mental errors of the type caused by misassignment of bands to lanes, a type of mental error that occurs frequently due to the tedious nature of reading an extended sequence.
While computer means can be substantially errorfree with respect to assigning bands to their proper lanes or other such mental errors, automatic equipment may in many cases be less efficient than a technician in correctly identifying which bands are, in reality, colinear across the gel. Because substantial bowing of bands often exist, it is very difficult for optical scanning apparatus to correctly identify which bands should be considered as having migrated equal distances from the origin. This function may be better performed by a skilled technician who can recognize the curvature in a particular gel pattern and adjust his interpretation accordingly.
Rodger Staden in Nucleic Acids Research 12, 499-503 (1984) describes a computer-assisted method for analyzing DNA sequences. In this system, a technician moves his sensing pen along each lane, placing it down at each band location to register the same into a computer. The computer keeps track of the data and analyzes the nucleotide sequence. Even this system produces substantial uncertainty as the system specifically provides for entry of uncertainty codes.
It would be desirable to have a DNA sequencing system that reduces potential errors caused by curvature in the gel. This would permit fully automatic sequencing by optical scanning apparatus assisted by computing means, such as microprocessors.