Sequence alignment involves arranging two or more sequences to identify similar regions within those sequences. For example, protein alignment involves arranging the sequences of two or more proteins to identify similar regions within those sequences. The outcome of a particular protein sequence alignment may indicate functional, structural, or evolutionary relationships between the aligned sequences. Although alignment may be applied to sequences representing any kind of information, some of the description below will refer to alignment of sequences representing proteins merely as one illustrative example.
The results of a particular protein alignment usually is represented by displaying each protein sequence horizontally as a sequence of letters representing the proteins in the sequence, with letter sequences arranged vertically, so that similar regions within each sequence are aligned vertically with each other. Although the description herein refers primarily to protein alignment, the same or similar techniques may be used to align other kinds of sequences, such as DNA and RNA sequences. All of these are examples of “sequence alignment.”
Alignment typically involves identifying: (1) overlaps (identical or similar regions) between the aligned sequences, also referred to as intersections; (2) differences, such as a region that is contained within one of the aligned sequences but not another; (3) complements, which represent opposites within the aligned sequences, as in the case in which one aligned sequence contains a 1 and another aligned sequence contains a −1 at the same or similar position; and (4) unions, which represent all of the unique elements in some or all of two or more of the aligned sequences.
A wide variety of techniques for performing sequence alignment have been developed, such as dot-matrix methods, dynamic programming-based methods, progressive methods, methods based on hidden Markov models, and methods that use artificial neural networks. Regardless of the kind of sequence alignment technique that is used, aligning very large sequences causes the amount of computational resources (i.e., memory and/or processing) required to perform the alignment to increase exponentially. In general, the number of computations required to align sequences of length n is n2. As a result, traditional sequence alignment techniques quickly become unwieldy for aligning sequences as the size of the sequences grows.
What is needed, therefore, are improved techniques for performing sequence alignment efficiently and effectively.