The invention relates generally to data analysis and, more specifically, to the identification of sequences within a data series.
In various applications, such as information theory, data compression, and intrusion detection, it may be desirable to identify sequences of interest, within a larger data series. It may be advantageous to identify such sequences of interest to extract meaningful information from the identified sequences or to allow for manipulation or analysis of the data series. For example, identification of repetitive sequences in a data series may allow for effective compression of the data or may indicate sequences having particular significance.
In the field of genetics, biologically significant sequences in a DNA strand tend to have higher redundancy than non-meaningful sequences. For the genomes, which are known or are being sequenced, the purposes of different parts of the genomes are currently unknown. Additionally, the identification of meaningful or interesting sequences within a genome poses a challenge. Hence, it may be desirable to develop techniques that efficiently and accurately recognize sequences of interest within a larger data series.