This application relates to an improved method and apparatus for determining the sequence of nucleic acid polymers, e.g., DNA.
Nucleic acid sequencing is conventionally performed using one of two approaches: the chain extension reaction disclosed by Sanger et al., Proc. Nat'l Acad. Sci. (USA) 74: 5463-5467 (1977); and chain degradation sequencing disclosed by Maxam & Gilbert, Proc. Nat'l. Acad. Sci. (USA) 74: 560-564 (1977). Chain extension sequencing, which is the more extensively used of the two approaches, utilizes a primer and a template-dependent polymerase enzyme which extends template-hybridized primer to produce polynucleotide fragments. Chain-terminating nucleotide analogs, such as dideoxynucleotide triphosphates, are included in the reaction, and these chain terminators, when incorporated, prevent further extension of the primer by the polymerase enzyme. As a result, a chain termination reaction containing, e.g, dideoxyadenosine triphosphate (ddATP) produces a mixture of polynucleotide fragments of differing lengths, each fragment ending in ddA. Other chain terminators produce fragments mixtures ending in other nucleotides. Performing one reaction with a terminator corresponding to each base (A, C, G and T), and evaluating the sizes of the fragments permits determination of the sequence of the original template polymer.
Automated apparatus for performing chain extension sequencing is available commercially. For example, single-dye automated sequencers such as the ALF-Express (Pharmacia LKB, Piscataway N.J.) usually run one separate reaction per lane. Methods to increase throughput in such automated DNA sequencers are currently constrained by the fact that in order to obtain a DNA sequence, four channels of data are required, one for each of the nucleotide bases A, C, G or T. Multi-dye sequencers such as the Prism 377 (Applied Biosystems Inc., Foster City Calif.) allow 4 reactions to be run in one lane. This method improves the throughput of a single gel 4-fold but still requires four channels of data per DNA sequence.
Intensity labeling has been proposed as a method for sequencing DNA in a single channel. In these methods, the ladder of all four sequencing reaction products (A, C, G, or T) is run in a single lane, and the species are distinguished by the amount of detected reaction product. For instance, U.S. Pat. Nos. 4,962,020, 5,122,345 and 5,409,881, illustrate a chain extension sequencing chromatogram where the relative amounts of the chain-terminators and thus the relative intensities of the reaction products are G&gt;A&gt;T&gt;C. Ansorge et al (1990) "One label, one tube, Sanger DNA sequencing in one and two lanes on a gel", Nuc. Acid. Res. 18: 3419-20 illustrates a chain termination method where the relative amounts of the chain terminators are T&gt;C&gt;G&gt;A. A two-lane intensity labeling method is disclosed in U.S. Pat. No. 5,124,247 issued to Ansorge. Negri et al. (1991) A Single-Reaction Method for DNA Sequence Determination. Anal. Biochem. 197:389-395 discloses a chain degradation method of intensity labeling where A=G&gt;C (and T is not analyzed). None of these methods have been employed commercially, perhaps because the intensity of labeling of reaction products changes substantially during the course of a single analysis.
An alternative method that does not depend on intensity labeling to increase throughput takes advantage of the fact that, at least in the diagnostic setting, the DNA sequence of a diagnostic gene is already known. In this case, the method determines which sequence the patient sample matches from a library of known sequence variants. This frequently can be done on the basis of a single nucleotide chain termination reaction as disclosed in U.S. patent application Ser. Nos. 08/497,202 and 08/577,858, assigned to the assignee of the instant invention. Single nucleotide sequencing may mean testing a known gene, such as an oncogene, for mutations such as nucleotide insertions, deletions, inversions or substitutions. It may mean testing a known polymorphic locus to identify which allelic variant(s) are present. It may mean testing a patient sample for the presence of a known pathogen, or testing for a known variation of a known pathogen. In each of these cases, at least some patient samples can be identified with certainty by determining the positions of less than all 4 nucleotides, hence using less than 4 channels.
Reduction the number of data channels required can also be obtained by applying algebraic coding methods from information theory. A method for DNA sequence analysis that resembles data compression techniques used in telecommunications was disclosed by Nelson et al. (1993) "Sequencing Two DNA Templates In Five Channels By Digital Compression" Proc. Nat'l Acad. Sci. (USA) 90:1647-1651. This paper describes how sequences of one DNA template can be determined in three channels; and how two DNA templates can be determined simultaneously using five channels. The paper does not teach how to reduce the number of channels for a full DNA sequence below these amounts.
It is an object of the instant invention to provide a method of increasing the throughput of an electrophoretic gel by reducing the number of channels necessary to identify the sequence of a DNA fragment.
It is a further object of the instant invention to provide a method of increasing the throughput of an electrophoretic gel that does not require spectrally distinguishable fluorophores or intensity labeling.
It is a further object of the invention to simplify the number of steps required to identify the sequence of a DNA fragment.
It is an object of the present invention to provide a diagnostic method and apparatus which identifies a DNA sequence in a patient but does not require the explicit identification of location of all 4 nucleotides of the DNA sequence.