Over the last few years, the general populous has encountered the proliferation of malicious software (sometimes referred to as “malware”) over the Internet. Malware has many forms including exploits, namely information that attempts to take advantage of a vulnerability in software that is loaded onto an electronic device in order to adversely influence or attack operations of that electronic device. Despite repeated efforts through advanced detection systems and software patches to address software vulnerabilities, malware continues to evade and infect electronic devices worldwide.
In combating the spread of malware, it has become paramount that a vast amount of information associated with network traffic, which is propagating to/from/within an enterprise network over a prolonged period of time, is persistently stored. This stored information offers immeasurable value for incident response testing so that security personnel can better understand when and how a network breach (malware infection of one or more endpoint devices within an enterprise network) occurred in order to address current security issues associated with the enterprise network. However, with increasing link speeds at the demilitarized zone (i.e., the physical or logical subnetwork of the enterprise network that interfaces a larger, untrusted network such as the Internet for example) and with network breaches occurring on average 200 or more days before detection, it is becoming cost prohibitive for conventional security systems to maintain needed information using conventional package storage solutions.
Currently, conventional packet storage solutions exist in the marketplace, but these storage solutions acquire packets and write them directly into storage without modification. While some of these conventional packet storage solutions may utilize compression, such compression offers meager storage savings.
It is contemplated that certain redundancy elimination (RE) techniques, such as deduplication for example, are not known to have been used by conventional packet storage solutions. Rather, deduplication has been used in the area of data backup as well as by Wide Area Network (WAN) acceleration products to avoid duplicate transmission of data already sent over the link in the past. In fact, it is believe that deduplication is currently not feasible for packet storage solutions based on significant operational disadvantages that would result.
For instance, packet storage is limited, and thus, as storage reaches capacity, old data would need to be removed (purged) and new data would need to be written into storage. According to conventional deduplication techniques, the stored reference for new data may refer to some purged data. This will render the newly written data useless since portions of the new data will be missing.
Also, for packet storage solutions that utilize hard disk drives for example, the scope of collateral data loss caused by disk failures at storage regions with references generated through conventional deduplication techniques is difficult to easily ascertain and the loss has to be contained. Hence, any packet storage solution that utilizes disk storage and conventional RE techniques could be rendered completely or substantially inoperable upon experiencing a disk failure.
Lastly, the presence of a chain of references may cause significant delays in packet retrieval. For instance, when a specific packet is retrieved from the packet store, cascaded references of packets could lead to reconstruction of one or more packets resulting in retrieval of the whole cascaded chain. As an illustrative example, a reference in packet B may refer to the same data within packet A and another reference in packet C may refer to the reference in packet B. Hence, the retrieval of packet C will implicitly result in retrieval of packet B and as well packet A, where these ancillary retrievals may unnecessary increase load on packet processing capabilities and increase the delay in retrieving packet C.