The invention relates to the field of data processing, and more particularly to the field of dynamic programming.
The problem of finding a best matching string and a corresponding optimal alignment score out of a set of strings (typically stored in a database) in respect to a reference string (or ‘query string’) is a common problem in many fields of technology, in particular in the field of bioinformatics and text analysis. Various sequence alignment algorithms for aligning two or more biological sequences, e.g. protein-, DNA- or RNA-sequences, are known and can be used e.g. for determining evolutionary conserved sequence homologies, for determining text documents stored in databases or the Internet which are highly similar to a reference text.
Sequence (or ‘string’) alignment algorithms are computationally complex. Most alignment programs use heuristic methods rather than global optimization because identifying the optimal alignment between more than a few sequences of moderate length is prohibitively computationally expensive. However, heuristic approaches increase the processing speed at the cost of accuracy. Thus, it is not guaranteed that an optimal alignment will be found. Other approaches such as, for example, dynamic programming approaches (e.g. based on an ordinary Needleman-Wunsch or Smith-Waterman algorithms) guarantee to give the correct and optimal sequence alignment as a result. However, said exact approaches are still too slow for many multiple sequence alignment problems of high complexity, even when performed on a graphical processing unit (GPU), as in the computational cost of a pairwise sequence alignment is quadratic with regard to the sequences' lengths.
However, it is sometimes required to find the exact best match quickly, in particular in medicine and the life sciences where the correctness of an alignment result may be of crucial impact for the diagnosis of a disease or for testing the validity of a scientific hypothesis.