The present invention relates generally to the automated determination of the nucleic acid sequence of a polynucleotide, such as DNA. More particularly, the method and apparatus of the present invention relates to automated, real-time processing of raw scanned image data acquired and generated by scanning automated DNA sequencers and similar devices.
Commercially available automated DNA sequencers generate a large image of DNA sequencing gels, in which the components of individual samples are often labelled with fluorescent or radioactive probes. Densitometric film scanners digitize images of sequencing ladders from films or autoradiograms, which are exposed and developed after fixed periods of electrophoresis. Other systems have fixed or scanning detectors which monitor electrophoretic transport of labeled oligomers through the gel, generating digital images of sequencing ladders in real time. In a typical conventional sequencer of this second type, a digital image file of about twenty megabytes is created, representing analysis of up to forty-eight samples over ten to sixteen hours of electrophoresis. Both classes of sequencing instrument use computer software to translate the raw scanned image dam, i.e., the digitized images of the sequencing ladders, to specific DNA sequences.
The prior art methods implemented in commercially available base-calling software typically initiate a scan of the ladder image to locate the trace of the next oligomer in the sequence, then evaluate the particular attributes of that oligomer's image which identifies its terminal nucleotide. Some real-time systems and film scanners use single labeled oligomers in familiar arrays of four parallel ladders, to spatially discriminate among the four possible terminal nucleotides. Other real-time scanning instruments employ selective band-pass filters for spectroscopic discrimination of four base specific fluorescent labels on the terminal nucleotides. The Applied Biosystems 373A automated DNA sequencer is an example of such a machine.
Unfortunately, prior art sequencers and their corresponding nucleic acid sequence data processing methods do not generate final DNA sequences in real time nor do they provide the information necessary to support pattern recognition-based analysis of the sequence data. Accordingly, both the time and accuracy of DNA sequence determination using these prior art methods are less than ideal and, in the case of accuracy, are deficient as compared to the base-calling skills of human experts.
What is needed, then, is a method and apparatus for processing nucleic acid sequence raw image data which is faster, more accurate, and which allows for processing in real time as the raw image data is acquired. Such a method and apparatus are lacking in the prior art.