Sequences of numerical data items can be analyzed in many ways to identify themes, patterns, repetitions, information, encoded data, codes or other signals in the data. It is a common requirement when processing numerical sequences to identify characteristics of the data for classification, categorization, characterization, comparison, grouping, indexing, searching, similarity analysis and the like. For example, in a computer system having a network, it can be desirable to classify data in a sequence of network events or data packets so as to identify data that is malicious in order to provide network protection mechanisms. Numerical sequences can be recorded, modeled and compared using sequence identification, data correlation and/or clustering techniques.
The generation of large quantities of data such as numerical sequences that are potentially complex and/or unstructured presents challenges for traditional data processing techniques. Such data can arise from sensors, data collection points, vehicles, people, devices, telecommunications services and facilities, medical services and facilities and many other sources. Data processing operations such as analysis, categorization, classification, search and visualization present real challenges with such data. In some contexts such data has been described as “big data” and these challenges can be described as the “big data problem”.
Furthermore, it is increasingly desirable to extract meaning from numerical sequences where traditional data analysis approaches fail to identify meaningful patterns or characteristics. Such meaning can be considered to be a signal residing, encoded or present in a numerical sequence. Such signals can be sparsely distributed across a numerical sequence and/or there may be a low signal to noise ratio. A signal of interest corresponding to a characteristic of a numerical sequence may therefore not be readily identifiable.
Accordingly it would be advantageous to identify signals in numerical sequences.