1. Field of the Disclosure
This disclosure relates to alignment or matching of sequences, e.g., strings of characters.
2. Description of the Related Art
It is frequently desired to “align” two sequences for the purpose of determining similar portions of these sequences. Alignment includes introducing “gaps” into one or both sequences in a manner that optimizes the similarity between the two sequences. This functionality is used, for example, to determine similar regions of two nucleotide or protein sequences.
One algorithm that has been used for sequence matching is the Smith-Waterman algorithm. See T. F. Smith and M. S. Waterman, Identification of Common Molecular Subsequences, J. Mol. Biol. (1981) 147, 195-97. Sequence matching is commonly performed on very long sequences, e.g., sequences over 223 characters in length. Performing matching of such sequences using the Smith-Waterman algorithm is very computationally intensive—on the order of MN (denoted as “O(MN)”), where M and N are the lengths of the two sequences being matched. As a result, the use of the Smith-Waterman algorithm is not practical in many instances. A less computationally intensive method for sequence matching is therefore desired.