Nowadays, people pay more and more attention to network security, and various security devices including firewalls are widely used. However, only the deployment of security devices is not enough to protect the network security, relevant personnel also need to continuously monitor and analyse the logs generated by the security devices because the logs contain very valuable information. For example, they can use the logs to detect security threats such as network intrusions, virus attacks, abnormal behaviours, and abnormal traffic, so as to selectively configure and adjust the overall network security strategy.
One way to analyse logs is to classify log events into several categories such as “information”, “error”, and “warning”. This method of analysis has limitations. Due to the large number and complexity of logs, important event information is likely to be submerged in the “warning” category and not processed in a timely manner. Therefore, in order to facilitate statistics, detect problems in a timely manner and avoid submerging small events of one type in other events of the same type, the logs need to be subdivided so that the type of event can be determined from the logs and processed accordingly.
The logs have a feature that they are in different formats based on differences in text and source. For example, there are differences between the formats of logs from firewalls and web servers. In addition, the logs can still be subdivided according to their meanings, even if the sources are the same.
The conventional method of subdividing the logs is to calculate the longest common subsequence (LCS), that is, to merge two log texts together and extract the common sequence part, so as to determine whether the two can be classified into one category. However, this conventional method only supports two texts. In the case of a plurality of log texts, any two of the texts need to be calculated, resulting in a very large amount of computation.