Genetic (DNA) sequencing is revolutionizing many aspects of biology and medicine, and creating an entirely new field of personalized medicine, particularly suited for analysis of cancers and determinations of corresponding treatment options. The cost of gene sequencing has dropped considerably, while new sequencing machines continue to increase the throughput of DNA sequencing, allowing sequencing technology to be applied to a wide variety of disparate fields.
In the process of genetic sequencing, genetic material is typically cleaved at numerous chromosomal locations into comparatively short DNA fragments, and the nucleotide sequence of these comparatively short DNA fragments is determined. The comparatively short DNA fragments are typically a few tens to hundreds of bases in length, and are referred to in the field as “short reads”. Such sequencing can be performed rapidly and in parallel, e.g., on the order of tens of billions of bases per day from one sequencing machine, which is a scale of several times that of human genome (approximately 3 billion bases in length).
For most applications, a complete genetic sequence of an organism is not determined de novo. Rather, in most instances, for the organism in question, a “reference” genome sequence has already been determined and is known. As the short reads are typically derived from randomly fragmenting many copies of an individual sample genome of such an organism, typically the first step for data analysis is the ordering of all of these fragments to determine the overall gene sequence of the individual sample using the reference genome sequence effectively as a template, i.e., mapping these short read fragments to the reference genome sequence. In this analysis, a determination is made concerning the best location in the reference genome to which each short read maps, and is referred to as the short read mapping problem.
This short read mapping problem is technically challenging, both due to the volume of data and because sample sequences are generally not identical to the reference genome sequence, but as expected, will contain a wide variety of individual genetic variations. Due to the sheer volume of data, e.g., a billion short reads from a single sample, the speed or runtime of the data analysis is significant, with the data analysis now becoming the effective bottleneck in gene sequencing. In addition, successful sequencing should exhibit sensitivity to genetic variations, to successfully map sequences that are not completely identical to the reference, both because of technical errors in the sequencing and because of genetic differences between the subject and the reference genome.
Short read mapping has traditionally been performed by software tools running on a cluster of processors, such as:                1. Bowtie, Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, p. R25, March 2009;        2. BWA, Heng Li and Richard Durbin, “Fast and accurate short read alingment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754-1760, July 2009;        3. MAQ, Heng Li, Jue Ruan, and Richard Durbin, “Mapping short DNA sequencing reads and calling variants using mapping quality scores,” Genome Research, vol. 18, no. 11, pp. 1851-1858, November 2008; and        4. BFAST, N. Homer, B. Merriman, and S. F. Nelson, “BFAST: An Alignment Tool for Large Scale Genome Resequencing,” PLoS ONE, vol. 4, no. 11, p. e7767, November 2009.        
As mentioned above, as the genetic sequencing of short reads has now become a highly automated and rapid process, providing sequencing in several (e.g., two) hours, these software analytical tools have now become the bottleneck in genetic sequencing analysis, which may require many hours to many days to generate the resulting aligned sequence (e.g., 7 to 56 hours for, respectively, Bowtie and BFAST).
Accordingly, a need remains for a system having both hardware and software co-design to provide for independent and parallel involvement of multiple configurable logic circuits such as field programmable gate arrays (“FPGAs”) to rapidly accelerate such genomic or other analysis and significantly reduce the time required to sequence a DNA sample. Such a system should also provide for greater sensitivity and accuracy than the prior art software solutions. In addition, such supercomputing applications would be served advantageously by parallel involvement of multiple FPGAs capable of operating independently and without extensive host involvement. Such a system should further provide for significantly parallel and rapid data transfers with minimal host involvement, including to and from memory located anywhere within the system.