This application relates to a method and apparatus for sequencing of DNA. The method of the invention makes use of an internal calibrant track which is co-electrophoresed with DNA sequencing fragments to facilitate linearization and alignment of the tracks for accurate base calling.
DNA sequencing is becoming an increasingly important diagnostic tool, and also forms an important component of research efforts such as the Human Genome Project. The most common sequencing procedures used today are based on the primer extension or xe2x80x9cSangerxe2x80x9d methodology. In the Sanger DNA sequencing method, a 5xe2x80x2-end-labeled oligodeoxynucleotide primer is bound sequence-specifically to a target DNA template which is to be sequenced. The primer is extended by a DNA polymerase enzyme, via incorporation of dNTPs. A chain-terminating dideoxy-NTP of one particular base type (A, C, G, T) is added to the reaction, to effect a termination of DNA chains at random positions along the sequence. The nested series of DNA fragments produced in this reaction is loaded on one lane of a thin denaturing polyacrylamide gel, and the bands are electrophoretically resolved, to produce a series of bands in the profile of that lane. A set of four reactions (with chain termination occurring via ddA, ddC, ddG, ddT incorporation) is required for explicit determination of the positions of all four bases in the sequence, and typically is run on four adjacent lanes of a sequencing gel.
Data traces are collected indicating the peak positions in each of the four lanes of a gel. In an ideal system, these four data traces could simply be placed one over another and the sequence could be read. This reading process is called xe2x80x9cbase calling.xe2x80x9d In practice, however, the data traces are not ideal because of a variety of factors including mobility differences between lanes and changes in resolution which occur as the size the fragments increases. Prior to the development of automated sequencing apparatus, the data traces were generally aligned prior to base calling by eye, i.e., a skilled technician looked at the traces and shifted the positions of the traces based on accumulated experience. One of the challenges of automated DNA sequencing is the proper alignment of the data traces using computer processing rather than human analysis.
Various approaches have been taken to the need for accurate trace alignment which is an essential prerequisite to accurate base calling. One approach is the use of a multi-dye sequencer, in which the traces for all four bases are obtained from a single lane of a gel. (See, for example, U.S. Pat. No. 5,171,534) This reduces many of the sources of variability, but requires the utilization of four different label types, and may involve an increase in the complexity of the detection apparatus. Another approach is described in commonly assigned U.S. Pat. No. 5,916,747. The present application provides another approach to the solution of this problem.
The present invention provides a method for evaluation of a target DNA sequence. The first step in the method is the preparation of a sample mixture containing one or more sets of sequencing polynucleotide fragments, each set containing fragments having lengths indicative of the positions of at least one base within the target DNA sequence. These sequencing fragment sets are each labeled with a different type of spectroscopically detectable label (for example a fluorescent label). The sample mixture also includes a set of calibrant polynucleotide fragments having a plurality of known fragment lengths. The calibrant polynucleotide fragments are labeled with a calibrant label which is spectroscopically-distinguishable from the label(s) on the set(s) of sequencing fragments. The sample mixture is then electrophoretically separated to separate the polynucleotide fragments as a function of fragment length in a separation medium such as a polyacrylamide electrophoresis gel. Real-time detection is used to detect the label(s) on the set(s) of sequencing fragments and the calibrant label as they migrate in a common lane of the separation medium to produce a sequencing data trace and a calibrant data trace. The calibrant peaks are then used to define a set of coefficients for linearizing the sequencing data trace from each lane to a common corrected time scale in which the peaks from each lane are evenly spaced. The linearized sequencing data traces are then aligned by assigning base position numbers to each peak in the sequencing data traces, and these aligned traces are used for base calling.
The method of the invention is suitably employed for sample mixtures which contain two sets of sequencing polynucleotide fragments representing the positions of two type of bases in the target DNA sequence. In this case, the two sets of sequencing polynucleotide fragments are each labeled with a different label, the first label and a second label, which are spectroscopically distinguishable from each other and from the calibrant label.
The method of the invention can be practiced using a sequencing apparatus having a detection system adapted for detection of two or more spectroscopically-distinguishable label types. The apparatus of the invention differs from prior art devices, however, since one of the detected labels is the calibrant data trace, not a sequencing data trace. This means that the data processing which is performed on the data traces is different.