A subsequence is a sequence of a subset of elements within an original sequence, where the elements of the subsequence are in the same order as the original sequence. For example, when the original sequence is a sentence, the elements may be the words of the sentence. The subsequence may be a collection of one or more words from the sentence in the same order as the words appear in the sentence.
Thus, when the original sequence is a string, i.e., a keyword, the subsequence may be a sequence of characters within the keyword, where the characters of the subsequence are in the same order as that of the original keyword. In other words, for a given string, S, a substring, S′, includes a subset of characters of S. In addition, the characters in S′ are in the same sequence as the characters are found in S. The characters in S′ need not, appear consecutively in S. For example, the string, “anna,” is a subsequence of the string “banana”, even though the characters in the string “anna” do not appear consecutively in the string “banana.”
The matching of subsequence strings is widely used in variety of applications. Examples of areas in which subsequence string matching can be used include indexing XML data, matching patterns in compressed text, graphing databases of chemical compounds, matching patterns in district time series data and mining data. Subsequence string matching is also a basic operation in bioinformatics. For example, DNA sequencing, protein interactions, and protein analysis are areas in which subsequence string matching may be useful.
Subsequence matching may also be used to identify events or activities of interest in a large database that stores long sequences of activities. Moreover, the matching of subsequence strings may be used to determine document similarity. In such an application, subsequence string matching may be based on matching subsequences of words instead of characters.
The typical algorithms used to solve subsequence matching problems are measured in terms of time complexity. The time complexity of an algorithm is a way to describe the amount of time taken by an algorithm to solve a problem. Typically, time complexity is described as a function on the size of the input to the problem. In the case of subsequence matching, the size of input may be considered as the number of characters in the input string.
Time complexity is commonly estimated by logically counting the number of elementary operations performed by the algorithm, where an elementary operation takes a fixed amount of time to perform. Existing solutions to subsequence matching problems have time complexity of O(n2). This high level of complexity is computationally expensive, especially for large problems.