Network forensics refers to the research on network data flow for civil, criminal and regulatory events, with the goal of protecting users and resources and preventing illegal incursions and other criminal activities arising from the constantly expanding network connection. As cybercrime is rampant today, network forensics plays a decisive role in computer forensics and judicial identification technologies, and in the event of cyber-security incidents such as cyber-attacks, it is necessary to investigate how the events take place. In view of this, the state promulgated the “Cyber Security Law” from the legal level stipulates that the network service provider network log must be stored more than 6 months.
Network forensics must collect and store network data streams that support subsequent forensic analysis. Traditional practices include collecting network service operations logs at high level of abstraction, such as site access logs, and collecting raw network byte streams at low level of abstraction, such as PCAP, PCAPNG, and other network packet storage files. However, on the one hand, the network abstraction logs at the high level of abstraction generally have only simple summary information and the relevant network operation details are lost, for example, there is only the header of the HTTP request, but the content of the HTTP request and the response are not saved. On the other hand, the network original byte stream at the low level of abstraction has the most complete information, but requires huge data storage capacity. The data storage capacity of 1 Gbps network traffic is up to 11 T per day, which consumes massive storage resources. Therefore, how to overcome the bottleneck of resources caused by massive data transmission and storage is a problem that network forensics urgently needs to solve.
In order to solve this contradiction, the prior art provides a strategy of compressing and storing the original network data stream, and usually performs hash mapping storage on the original network data stream by using a Bloom-filter algorithm, at the same time, it does not affect the support of the post-analysis mechanism by the compressed data and reconstruct the network events. This strategy to some extent reduces the storage space requirements for data.
On this basis, the prior art also proposes further improved methods. Chinese patent application CN101572633A proposes a network forensics method and system for extracting plaintext segments from an original network data stream and a network connection record corresponding to the plaintext segment for Bloom filter mapping so as to filter out a large amount of network protocol structure data and control class data irrelevant to content forensics in the original data stream, further reducing occupation of storage space and prolonging storage time of basic data of network forensics. Chinese patent application CN104794170A proposes a method for traceability of network forensics content based on a fingerprint multiple Hash Bloom filter. The method reconstructs the captured original network traffic data packets and constructs an application-layer session. At each time interval, the session content is stored in chunks in the enhanced fingerprint multiple Hash Bloom filter, and the session index table is saved. The method can obtain the traceability of the communication content in the original data stream and improve the traceability and accuracy of the network forensics content. However, the inventor has found that these prior arts are all aimed at the compression and storage of the original byte stream of the network with a low level of abstraction. However, due to the massive nature of the original network data stream, these compression and storage methods still consume a large amount of storage space. In the process of forensics, the reconfiguration of network events is analysed, and the Bloom filter algorithm itself also has a certain false alarm rate. Therefore, there is a need for a more simple and efficient network forensic compression and storage method.