Today, information technology professionals often encounter myriad different problems and challenges during the operation of a computer network or network of networks. For example, these individuals must often cope with network device failures and/or software application errors brought about by causes such as configuration errors. In order to permit network operators and managers to track down the sources of such problems, network monitoring devices capable of recording and logging vast amounts of information concerning network communications have been developed.
Conventional network monitoring devices, however, suffer from scalability problems. For example, these devices are generally limited in the number of applications they can monitor and the number of locations they can monitor from, since the deployment and management of the required agents are both expensive and time-consuming. For this reason, flow-based network monitoring has been developed. Basically, a flow is an end-to-end TCP (Transmission Control Protocol) or UDP (User Datagram Protocol) conversation across any IP network type, between entities such as application, server and client. Flows can be monitored by aggregating the data from TCP/IP packet headers as they pass through a data aggregation point, via a spanning port or tap. By itself, this flow data supplies detailed information on performance and utilization.
However, multiple copies of identical packets may be acquired when implementing such network monitoring devices. These duplicate copies are frequently created by collecting packets at multiple monitoring points of the flow. These duplicate packets contribute to errors in network performance and utilization analyses. Therefore, it is important that these duplicate packets be removed or their instances minimized.
Common algorithms for identifying duplicate packets include ones for comparing a new packet with a list of existing packets. However, the selection of existing packets is frequently based on a fixed queue length of a packet buffer. Relevant packets may be missed because the list of what should be considered relevant packets exceeds the fixed queue length. Indeed, the number of packets per second presented to a network monitoring system may vary greatly from site to site and even day to day. For example, a highly utilized network may present more packets per second compared to a less utilized network. It is therefore a challenge to retrieve the appropriate packets for comparison. The present invention addresses these needs.