The combined work of Avery et al, in 1944, and Watson and Crick in 1953, established that DNA contains the information that defines each organism, and that it is a long string of chemical building blocks—denoted by the letters A, T, C, G,—whose precise order, or “sequence”, is ultimately how this information is encoded. Since that point, it has been a major goal to develop ways to read this sequence, in order to eventually understand how it relates to the properties of different species, and of individuals within each species.
In 1978, Sanger introduced the first practical biochemical technique for determining the sequence of DNA. This method was refined and automated, and was the basis for all practical DNA sequencing performed in the next 25 years, culminating in the publicly funded International Human Genome Project (and the privately funded parallel effort undertaken by Celera Corporation), which in 2003 produced a reference DNA sequence for the almost 3 billion letters long human genome. This effort required nearly a decade, and billions of dollars in equipment, labor and chemical reagents. Because the Sanger reaction is inherently serial, with each reaction producing only a few hundred letters of sequence, roughly about 100 million individual reactions had to be performed in the course of this effort, ultimately using thousands of semi-automated sequencing machines, each of which could perform thousands of separate reactions per day. Since the genomes of most species are large (tens of millions to billions of letters), Sanger sequencing to determine a reference sequence for a new species, or to obtain the sequence of a specific individual within a species, remains a massive and costly undertaking, even with all the technologically improvements made during the course of the Human genome Project.
Starting in the early 1990's, new approaches to sequencing DNA were under development with the intent of overcoming fundamental limitations of the Sanger technique that made it both inherently serial and difficult to miniaturize in order to reduce reagent usage. The general goal of these methods was to make DNA sequencing massively parallel, so that millions, or billions, of DNA sequences were read at the same time within a single small reaction volume, thus allowing the large amount of sequence present in typical genomes (millions to billions of letters) to be read quickly, and at greatly reduced cost in terms of labor and chemical reagents.
More recently, several such massively parallel sequencing platforms have been developed to the commercial level, including the 454 system (Roche, 2004), the Solexa system (Illumina, 2005), the SOLiD system (ABI/Life Technologies, 2006), the HelioScope system (Helicos, 2007), the Polinator system (Danaher Motion, 2008), and others are presently under development, such as the systems at Complete Genomics, Inc., Intelligent BioSystems Inc., and Pacific Bio, Inc. While the Sanger technique remains the gold standard for sequencing in terms of accuracy and read length, these new technologies have rapidly become the preferred method of generating the large amounts of DNA sequence necessary for sequencing of new organisms, or for sequencing individuals, for the purpose of finding individual sequence variations relative to the species reference sequence. In particular, this latter activity will be a key component of personalized medicine, where an individual patient's DNA sequence will be factored into their medical treatment.
A variety of biochemical processes are conducted in current massively parallel sequencers, and one feature they have in common is that they all acquire primary data in the form of millions of digital images, each image showing a field filled with many localized “spots” or “beads.” These images must be processed to extract the bead locations and bead brightness signals which are converted to the DNA sequence information by rules appropriate for the specific platform. However, as shown in FIG. 1, the beads are usually very small, blurred and densely clustered, and it may be difficult to determine the location and signal strength of the beads. Under these circumstances, the results of DNA sequencing may be adversely affected.
Therefore, there remains a need for a new and improved method and apparatus for processing image data generated by bioanalytical devices, such as DNA sequencers, to optimize the image with enhanced resolution, accuracy and bead density.