Many types of data contain sequences. For example, the network packets sent on the network interface, the order of function calls made by an application, the order in which a user clicks on a website, all contain sequences. In each of these examples, mostly one event (such as clicking on a link or calling a particular function) occurs at a point in time. Thus, there is a clear temporal ordering between each event.
In this type of data each event has a temporal ordering. A sequence of events is known as a trace. One way to analyze a large data set for a particular purpose is to analyze the data based on the features that best describe the data in a manner that is relevant to the purpose. This can be achieved by transforming the data into a reduced representation set of features (called a features vector). The act of transforming the data into the set of features is known as feature extraction. Feature extraction involves simplifying the amount of resources used to describe a large set of data accurately.
Once the features are extracted they can be used to process the traces. This processing can involve classifying a trace (to determine whether a trace belongs to a certain group or class of traces), clustering similar traces, and fingerprinting the traces. Fingerprinting is a process that maps a large amount of data to a much smaller data string that uniquely identifies the large amount of data.