1. Field of the Invention
This invention generally relates to data sequence processing methodologies, and more particularly, to methods and systems for identifying partial order patterns in sequences of data such as, for example, sequences of characters, character sets and real numbers.
2. Background Art
Given an input sequence of data, a “motif” is a repeating pattern, possibly interspersed with don't-care characters, that occurs in the sequence. The data could be characters or sets of characters or real values. In the first two cases, the number of motifs could potentially be exponential in the size of the input sequence, and in the third case there could be uncountably infinite number of motifs. Typically, the higher the self-similarity in the sequence, the greater is the number of motifs in the data. Motif discovery on such data, such as repeating DNA or protein sequences, is a source of concern since such data exhibits a very high degree of self-similarity (repeating patterns).
Given a body of evidence as a sequence of motifs or genes or markers, the task is to mine information from this data. A maximal boolean expression pattern gives all the motifs that describe the set of sequences. However, it is sometimes observed that the same set of motifs may be present but the sequences are endowed with different functions. A closer look reveals that the partial orders of these motifs are different for different functions.