This invention relates to a method of processing output signals from an automated electrophoresis detection apparatus, and to an apparatus which employs this method for sequencing nucleic acids.
One of the steps in nucleotide sequence determination of a subject nucleic acid molecule is interpretation of the pattern of nucleic acid fragments which results from electrophoretic separation of fragments, or reaction products, of a DNA sequencing reaction (the "fragment pattern"). The interpretation, colloquially known as "base calling", involves determination from the recorded fragment pattern of the order of four nucleotide bases, A (adenine), C (cytosine), G (guanine) and T (thymine) for DNA or U (uracil) for RNA in the subject nucleic acid molecule.
The chemistry employed for a DNA sequencing reaction using the dideoxy (or chain-termination) sequencing technique is well known, and was first reported by Sanger et al. (Proc. Natl. Acad. Sci. USA 74: 5463-5467 (1977)). Four samples of nucleic acid fragments (terminating in A, C, C, or T(U) respectively in the Sanger et al. method) are loaded at a loading site at one end of an electrophoresis gel. An electric field is applied across the gel, and causing the fragments to migrate from the loading site towards the opposite end of the gel. During this electrophoresis, the gel acts as a separation matrix. The fragments, which in each sample are of an extended series of discrete sizes, separate into bands of discrete species in a lane along the length of the gel. Shorter fragments generally move more quickly than larger fragments.
If the DNA fragments are labeled with a fluorescent label, an automated electrophoresis detection apparatus (also called a "DNA sequencer") can be used to detect the passage of migrating bands in real time. Existing automated DNA sequencers are available from Applied Biosystems, Inc. (Foster City, Calif.), Pharmacia Biotech, Inc. (Piscataway, N.J.), Li-Cor, Inc. (Lincoln, Neb.), Molecular Dynamics, Inc. (Sunnyvale, Calif.) and Visible Genetics Inc. (Toronto). Other methods of detection, based on detection of features inherent to the subject molecule, such as detection of light polarization as disclosed in U.S. patent application Ser. No. 08/387,272, now U.S. Pat. No. 5,543,018, which is incorporated herein by reference, are also possible.
A significant problem in determining a DNA sequence, encountered particularly with high speed DNA sequencing and in sequencing apparatus which do not combine the four sets of sequencing reaction products in a single lane, is alignment of data signals from the four different output channels of an automated DNA sequencing apparatus. Once data is aligned properly, it is relatively straight-forward to base-call it. However, this initial step can be very challenging since the output signal may be erratically shifted and/or stretched as a result of chemistry and gel anomalies. A reliable method of aligning data, that can produce data which takes into account non-linear shifting and stretching of signal output, is highly desirable particularly for high-speed DNA sequencing.
Existing prior art determinants in this field are very limited. Existing automated sequencers traditionally operate at voltages low enough that non-linear shifting is avoided. The use of low voltages, however, limits the speed with which separation of sequencing fragments into discrete bands can be accomplished.
Published methods of computer assisted base calling include the methods disclosed by Tibbetts and Bowling (U.S. Pat. No. 5,365,455) and Dam et al (U.S. Pat. No. 5,119,316) which patents are incorporated herein by reference. Both patents assume alignment of output signals and address only aspects of base-calling from the aligned signals.
It is an object of the present invention to provide a method of aligning real-time signals from the output channels of an automated electrophoresis apparatus.
It is a further object of the invention to provide an improved method of base-calling an DNA signal sequence aligned according to the invention.
It is still a further object of the invention to provide an apparatus for sequencing nucleic acids which utilizes the improved method in accordance with the invention for aligning real-time signals from the output channels of an automated electrophoresis apparatus.