This application relates to a method for alignment of data traces obtained from a DNA sequencing apparatus, and to apparatus adapted for practicing of this method.
DNA sequencing is becoming an increasingly important diagnostic tool, and also forms an important component of research efforts such as the Human Genome Project. The most common sequencing procedures used today are based on the primer extension or “Sanger” methodology. In the “dye primer” version of Sanger DNA sequencing, a 5′-end-labelled oligodeoxynucleotide primer is bound sequence-specifically to a target DNA template which is to be sequenced. The primer is extended by a DNA polymerase enzyme, via incorporation of dNTPs. A chain-terminating dideoxy-NTP of one particular base type (A, C, G, T) is added to the reaction, to effect a termination of DNA chains at random positions along the sequence. The nested series of DNA fragments produced in this reaction is loaded on one lane of a thin denaturing polyacrylamide gel, and the bands are electrophoretically resolved, to produce a series of 5′ end labeled bands in the profile of that lane. A set of four reactions (with chain termination occurring via ddA, ddC, ddG, ddT incorporation) is required for explicit determination of the positions of all four bases in the sequence, and typically is run on four adjacent lanes of a sequencing gel.
Data traces are collected indicating the peak positions in each of the four lanes of a gel. In an ideal system, these four data traces could simply be placed one over another and the sequence could be read. This reading process is called “base-calling.” In practice, however, the data traces are not ideal because of a variety of factors including mobility differences between lanes and changes in resolution which occur as the size the fragments increases. Prior to the development of automated sequencing apparatus, the data traces were generally aligned by eye prior to base-calling, that is by a skilled technician looking at the traces and shifting the relative positions of the traces based on accumulated experience. One of the challenges of automated DNA sequencing is the proper alignment of the data traces using computer processing rather than human analysis.
Various approaches have been taken to the need for accurate trace alignment which is an essential prerequisite to accurate base-calling. One approach is the use of a multi-dye sequencer, in which the traces from all four bases are obtained from a single lane of a gel. (See, for example, U.S. Pat. Nos. 5,751,534 and 5,821,058.) This reduces many of the sources of variability, but requires the utilization of four different label types, and may involve an increase in the complexity of the detection apparatus. Another approach is described in commonly assigned U.S. Pat. No. 5,916,747. The present application provides yet another approach to the solution of this problem.