Search is a widespread area, with many more applications beyond internet searching using search engines such as Bing, Yahoo or Google. For example, meteorologists have been searching for weather patterns, chemists have been searching for the weights and types of atoms contained in an unknown substance, and biologists have been searching for patterns and mutations in our genetic code and comparing those patterns with the genetic sequences of other organisms.
Many search application typically involve searching through a very large data set, referred to as a “reference”, such as the set of webpages on the internet. The item that is being searched for, referred to as the “query”, may or may not match the reference exactly at some point. Indeed, the most interesting cases tend to be those in which the query is close to the reference but does not match the reference exactly (e.g., as a simple example, a search on the internet for “Super Computer” returns links to pages containing the word “Supercomputer”). To provide accurate search results, therefore, it is important to support inexact matching of the query to the reference. Inexact matching tends to require a brute-force approach, resulting in much longer search times in a typical computing environment. In many cases, researchers are searching a very large reference database, or many databases, millions or even billions of times. As a result, the speed or runtime of the searching is of paramount importance.
Another significant area for searching is genetic analysis. Genetic (DNA) sequencing is revolutionizing many aspects of biology and medicine, and creating an entirely new field of personalized medicine, particularly suited for analysis of cancers and determinations of corresponding individualized treatment options. The cost of sequencing has dropped considerably, while new sequencing machines continue to increase the throughput of DNA sequencing, allowing sequencing technology to be applied to a wide variety of disparate fields.
In the process of genetic sequencing, genetic material is typically cleaved at numerous chromosomal locations into comparatively short DNA fragments, and the nucleotide sequence of these comparatively short DNA fragments is determined. The comparatively short DNA fragments are typically a few tens to hundreds of bases in length, and are referred to in the field as “short reads”. Such sequencing can be performed rapidly and in parallel, e.g., on the order of tens of billions of bases per day from one sequencing machine, which is a scale of several times that of the human genome (approximately 3 billion bases in length).
For most applications, a complete genetic sequence of an organism is not determined de novo. Rather, in most instances, for the organism in question, a “reference” genome sequence has already been determined and is known. Short reads are typically derived from randomly fragmenting many copies of an individual sample genome of such an organism. Typically the first step for data analysis is the ordering of all of these fragments to determine the overall sequence of the individual sample using the reference genome sequence effectively as a template, i.e., mapping these short read fragments to the reference genome sequence. In this analysis, a determination is made concerning one or more likely locations in the reference genome to which each short read potentially maps, and is referred to as the short read mapping problem.
This short read mapping problem is technically challenging, both due to the volume of data and because sample sequences are generally not identical to the reference genome sequence, but as expected, will contain genetic variations. Due to the sheer volume of data, e.g., a billion short reads from a single organism (or a single sample), the speed or runtime of the data analysis is significant, with the data analysis now becoming the effective bottleneck in many bioinformatics pipelines. In addition, successful sequencing should exhibit sensitivity to genetic variations, i.e., should perform inexact searching, to successfully map sequences that are not completely identical to the reference, both because of technical errors in the sequencing and because of genetic differences between the sequenced organism and the reference genome.
Short read mapping has traditionally been performed by software tools running on a cluster of processors, such as:                1. Bowtie, Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg, “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, p. R25, March 2009;        2. BWA, Heng Li and Richard Durbin, “Fast and accurate short read alingment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754-1760, July 2009;        3. MAQ, Heng Li, Jue Ruan, and Richard Durbin, “Mapping short DNA sequencing reads and calling variants using mapping quality scores,” Genome Research, vol. 18, no. 11, pp. 1851-1858, November 2008; and        4. BFAST, N. Homer, B. Merriman, and S. F. Nelson, “BFAST: An Alignment Tool for Large Scale Genome Resequencing,” PLoS ONE, vol. 4, no. 11, p. e7767, November 2009.        
As mentioned above, as the genetic sequencing of short reads has now become a highly automated and rapid process, providing sequencing in several (e.g., two) hours, these software analytical tools have now become the bottleneck in genetic sequencing analysis, which may require many hours to many days to generate the resulting aligned sequence (e.g., 7 to 56 hours for, respectively, Bowtie and BFAST).
Accordingly, a need remains for a system to provide for independent and parallel involvement of multiple configurable logic circuits, such as field programmable gate arrays (“FPGAs”), to rapidly accelerate such inexact searching, for genomic or other analyses. Such a system would benefit from a software and hardware co-design, where the hardware aspect relies upon FPGAs as an accelerator system. Such a system should also provide for greater accuracy than the prior art software-only solutions. In addition, such supercomputing applications would be served advantageously by parallel involvement of multiple FPGAs capable of operating independently and without extensive host involvement. Such a system should further provide for significantly parallel and rapid data transfers with minimal host involvement, including to and from memory located anywhere within the system.