Almost from the earliest days of computing, users have been attaching devices together to form networks. Several types of networks include local area networks (LANs), metropolitan area networks (MANs) and wide area networks (WANs). The Internet provides one example of a WAN, which connects millions of computers around the world.
Networks provide users with the capacity of dedicating particular computers to specific tasks and sharing resources such as printers, applications and memory among multiple machines and users. Some computers, commonly known as servers, provide functionality to other computers on a network. Communication among computers and devices on a network is typically referred to as traffic.
Of course, the networking of computers adds a level of complexity that is not present with a single machine, standing alone. A problem in one area of a network, whether with a particular computer or with the communication media that connects the various computers and devices, can cause problems for all the computers and devices that make up the network. For example a file server, a computer that provides disk resources to other machines, may prevent the other machines from accessing or storing critical data; it thus prevents machines that depend upon the disk resources from performing their tasks.
Network and MIS managers are motivated to keep business-critical applications running smoothly across the networks separating servers from end-users. They would like to be able to monitor response time behavior experienced by the users, and to clearly identify potential network, server and application bottlenecks as quickly as possible. They would also like the management/maintenance of the monitoring system to have a low man-hour cost due to the critical shortage of human expertise. It is desired that the information be consistently reliable, with alarm generation providing few false positives (else the alarms will be ignored) and few false negatives (else problems will not be noticed quickly).
A proper understanding of network performance requires several metrics including network latency. Network latency provides an understanding of network delays between devices on the network. A good solution to understanding network latency enables universal network coverage, gathering real data, and mitigating network slowdowns associated with monitoring systems. However, deficiencies of current methods make accurately and efficiently determining network latency difficult.
Current methods rely on either active or passive means. Active monitoring systems periodically transmit traffic across the network. Once the active monitoring system gathers data, the system determines several network metrics, sometimes including network latency. Traditional passive monitoring systems observe data being sent on network devices to gather network performance metrics. However, both methods suffer from several drawbacks.
Data sent from active monitoring systems may burden network infrastructure, such as routers or network switches, and network endpoint devices such as printers, faxes, or computers. The endpoint devices must have adequate resources to handle incoming test transmissions.
Also, in order to determine network latency accurately, monitoring systems must conduct a large scale test of the whole network. Large scale testing may increase the load on the network. Large scale testing may interrupt or slow down actual network communications. However, if an active monitoring system does not conduct a large scale test, the system may garner useless data that does not accurately reflect real network performance. For instance, if the traffic sent by an active monitoring system does not match real network data, the traffic may be routed or prioritized differently, resulting in meaningless metrics. Reliably anticipating the types of data that may be transmitted across a network presents another serious challenge for test designers. Thus, when using active monitoring systems, testers may need to make difficult network performance/data reliability trade-off decisions.
IT professionals may consider using traditional passive monitoring systems as well. Traditional passive monitoring systems demand probes installed throughout the network to effectively monitor traffic through the entire network. Each probe is responsible for monitoring a specific location or a few locations (typically 4-8) using span traffic. Many probes are required to provide general visibility into moderate to large computer networks. This approach adds considerable expense in terms of the monitoring equipment and the resources necessary to maintain it. Further, most networking devices have a limited number of span ports available (typically 1-2). Dedicating a span port to latency monitoring makes it unavailable to other uses such as packet capture and traffic analysis.
Network infrastructure devices often aggregate network traffic summary information and export this information in the form of flow-based packets. Flow processors operating on network infrastructure devices generate flow-based packets to record network traffic information. Although processes such as overall traffic monitoring, device resource allocation and accounting often use these packets, current methods do not efficiently and accurately use this flow-based information to calculate network latency.