Log data records systems' or users' activities through time. Large volumes of log data are often managed by database systems. In log data, every tuple (i.e., an ordered set of data) corresponds to a logged event and every event is associated with a timestamp that specifies the event's time of occurrence. We use the term “event sequences” to characterize this data. The set of unique tuples formed by ignoring the occurrence times of the logged events defines a set of different “event types”.
Summarization and analysis of event sequences can provide useful insights in forensic investigation. However, when attempting to review activity for forensic investigation, the volume of information in the event sequences can be overwhelming. Standard SQL methodology is generally inadequate for such complex, large-scale data-analysis tasks. Other work on event sequence mining, including off-the-shelf event sequence data-mining software, has focused on discovering local patterns based on known constraints. The constraints are typically provided by a data analyst as parameters to those mining methods. Those methods find recurring local structures based on the predefined constraints (e.g., episodes of more than three consecutive failed attempts to access a computer system), but fail to provide a global model of the data and fail to give a comprehensive summary of an entire event sequence. Furthermore, those data mining methods tend to discover a prohibitively large number of local patterns since they provide all possible local patterns that satisfy a predefined constraint. This can overwhelm data analysts with too many local patterns to be useful for spotting general activity trends and/or for pinpointing specific suspicious actions. This causes the data analyst to have to adjust parameters iteratively in an attempt to determine general activity trends and/or suspicious actions.