There is often a need with electronic systems to monitor network traffic. As such, devices have been developed that analyze data packets within network packet streams that are communicated across networks. These network communications can occur through a number of different systems and can include both wired and wireless communication links. The analysis of network packets can occur both off-line and in real-time. For off-line analysis of network packets, the network packets are stored and then later analyzed without providing real-time data network analysis. For real-time analysis of network packets, the network packets within the packet stream must be analyzed fast enough to keep up with the real-time flow of network traffic. As such, real-time analysis is more difficult to achieve than off-line analysis of network data packets.
One problem associated with analysis of network packets is the large number of packets that are communicated and the speed at which they are communicated. For example, many network systems utilize communication links that operate at speeds of 10 Gbps (gigabits per second) and above. And many network communication systems have large numbers of active communication links at a time. This problem of analyzing network packets in high volume and high speed networks is made worse because many devices that analyze network packets do so using filters and other parameters that can cause duplicate packets to be present in the packet streams being analyzed. For example, monitoring redundant communication links, monitoring both ends of a communication link, and mis-configuration of copy ports (e.g., SPAN ports on Cisco network switches) can lead to duplicate packets in network packet streams to be analyzed. These duplicate packets increase the bandwidth and processing speed needed by the analyzing devices to process packet streams.
One prior solution to this problem of duplicate packets is to provide off-line removal of duplicate packets followed by off-line analysis of captured network packets. For this off-line solution, packets within a network packet stream can be captured and stored. The captured packet data file can then be processed to remove duplicate packets. For this removal of duplicate packets, for example, length and MD5 sum for packets can be compared to the previous packets (e.g., previous four packets), and matching packets can be removed as duplicates. It is noted that MD5 is a known, large cryptographic hash algorithm, which can be used to generate large hash values. Once duplicate packets are removed, the packet data file can then be analyzed for various events and/or occurrences, as desired.
Other prior solutions for removal of duplicate packets are described in U.S. Pat. No. 8,462,781, which is hereby incorporated by reference in its entirety. In part, these solutions provide input packets to deduplication engines that generate hash values associated with the input packets and that use these hash values to identify and remove duplicate packets within the input packet stream.
FIG. 1A (Prior Art) is a block diagram of an embodiment 100 for a packet deduplication system described in U.S. Pat. No. 8,462,781. The input packet stream 102 is provided to a packet buffer 130 that stores the incoming packets pending the decision to delete or pass the packets to the output packet stream 132. The hash generator 103 receives the input packet stream 102 and generates one or more hash values using one or more hash algorithms. The one or more hash values are provided to the deduplication controller 110 through signal lines 104. The one or more hash values, or a subset thereof, are also provided to a deduplication window controller 125 through signal lines 106. The deduplication controller 110 utilizes at least one hash value to locate data for previously received packets 142 that is stored within data storage system 140. The data stored for each previously received packet can be, for example, the hash value(s) generated for the previously received packets. These stored hash value(s) are then obtained from the data storage system 140 by the deduplication controller 110 and used by the deduplication controller 110 for comparison to a hash value associated with the current incoming packet. If a match is found, the current incoming packet is deemed a duplicate packet, and the control signal 131 is a DELETE control signal. If a match is not found, the current incoming packet is deemed not to be a duplicate packet, and the control signal 131 is a PASS control signal. If a match is not found, the deduplication controller 110 also stores data associated with the hash value(s) for the current incoming packet in the data storage system 140. The deduplication controller 110 provides the DELETE/PASS control signal 131 to the packet buffer 130. If the control signal 131 is a DELETE control signal, the current packet is removed or deleted from the packet buffer 130 so that it does not become part of the output packet stream 132. If the control signal 131 is a PASS control signal, the current packet is allowed to pass from the packet buffer 130 so that it does become part of the output packet stream 132.
The deduplication window controller 125 operates to limit the amount of information stored with respect to incoming packets, and the deduplication window controller 125 can utilize one or more parameters to form a deduplication window (e.g., timestamps, number of packets). The deduplication window controller 125 can also receive hash value(s) through signal lines 106 from the hash generator 103 and can receive memory location information 136 and the DELETE/PASS control signal 131 from the deduplication controller 110. The deduplication window controller 125 utilizes this information to make determinations as to when to remove data stored for previously received packets. In part, the deduplication controller 125 can send control signals 138 that provide information to the deduplication controller 110 with respect to which packet information to remove. In this way, a deduplication window is created for limiting the amount of information stored for previously received packets.
FIG. 1B (Prior Art) is a block diagram of an embodiment 150 described in U.S. Pat. No. 8,462,781 that includes a load balancer 154. An input packet stream 152 is received by load balancer 154. Load balancer 154 distributes the input packet stream into multiple different packet streams 102A, 102B, 102C . . . and provides these different packet streams to multiple different deduplication engines 100A, 100B, 100C . . . for removal of duplicate packets. The individual output packet streams 132A, 132B, 132C . . . from the multiple deduplication engines 100A, 100B, 100C . . . are then be sent to a combiner 156. Combiner 156 combines the different packet outputs into a single output packet stream 158. The multiple deduplication engines 100A, 100B, 100C . . . use local memory storage of hash values associated with their respective individual input packet streams 102A, 102B, 102C . . . received from the load balancer 152. Each of the deduplication engines 100A, 100B, 100C . . . operate to identify duplicate packets in parallel. For certain further embodiments, hash algorithms can also be used to help generate the multiple input packet streams 102A, 102B, 102C . . . so that duplicate packets will end up in the same stream. For example, a pre-hash operation can be performed on the input packet stream 152, and the results could be used to help determine into which input packet streams 102A, 102B, 102C . . . to place each packet, thereby causing duplicate packets to end up in the same packet stream.
While such prior load balancing solutions help to spread individual packets within a packet stream among a plurality of deduplication engines, these prior solutions can also exhaust available memory or logic resources particularly where packet sizes are large.