The amount of biosequence data being produced each year is growing exponentially. Extracting useful information from this massive amount of data efficiently is becoming an increasingly difficult task. The databases of genomic DNA and protein sequences are an essential resource for modern molecular biology. This is where a computational search of these databases can show that a DNA sequence acquired in the lab is similar to other sequences of a known biological function, revealing both its role in the cell and its history over evolutionary time. A decade of improvement in DNA sequencing technology has driven exponential growth of biosequence databases such as NCBI GenBank, which has doubled in size every twelve (12) to sixteen (16) months for the last decade and now stands at over forty-five (45) billion characters. These technological gains have also generated more novel sequences, including entire mammalian genomes, which keep search engines very busy.
Examples of this type of searching can be found in a paper entitled Biosequence Similarity Search On The Mercury System, P. Krishnamurthy, J. Buhler, R. D. Chamberlain, M. A. Franklin, K. Gyang, and J. Lancaster, In Proceedings of the 15th IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP04), September 2004, Pages 365-375 (2004). Another example is a paper entitled: NCBI BLASTN STAGE 1 IN RECONFIGURABLE HARDWARE, by Kwame Gyang, Department of Computer Science and Engineering, Washington University, August 2004, Technical Report WUCSE-2005-30. Another example is a paper entitled “BLASTN Redundancy Filter in Reprogrammable Hardware”, by C. Behrens, J. Lancaster, and B. Wun, Department of Computer Science and Engineering, Washington University, Final Project Submission, Fall 2003. These papers are each incorporated by reference, in their entirety.
There is a growing need for a fast and cost effective mechanism for extracting useful information from biosequence data.