In 1983, Doolittle and colleagues reported similarity between a newly discovered oncogene and a normal gene for growth. Based on this similarity, these researchers concluded that cancer is probably caused by mutation of an otherwise normal growth gene. This incident demonstrates the value of genetic database search techniques.
A pair of similar DNA sequences usually represent code for similar strands of amino acids and therefore express similar functions or structures. When a new strand of DNA is sequenced, the strand homology to a well-studied and well-documented strand of DNA stored in the DNA database usually provides the first clue as to the new strand's function. Instead of testing and analyzing the coded protein and generations of bacteria, biologists can search the database for similar sequences. The researchers can then design experiments to test the results of the search.
As a result of the enormous improvement of DNA sequencing technology, the rate of growth of the DNA database has grown exponentially over the last decade from 1.5 million nucleotides per year in 1989 to an expected 1.6 billion nucleotides per year in 1999. However, this boom in the genetic database poses serious problems for conventional database search methods. These conventional methods are based on heuristic or dynamic programming techniques which typically require time in the order is of N.times.M, where N is the length of the database sequence being searched and M is the length of the target sequence being searched for. Two examples of heuristic search techniques are FASTP described by D. J. Lipman, et al., in "Rapid and sensitive protein similarity searches", Science 227: 1435-1441, 1985, and BLAST described by Stephen Altschul, et al., in "Basic local alignment search tool", J. of Molecular Biology, 215:403-410, 1990. Dynamic programming techniques are described by T. F. Smith, et al., in "Identification of Common Molecular Subsequences", J. of Molecular Biology, 147:195-197.