I. Field of the Invention
This invention relates to computer systems, in particular to network environments.
II. Background Information
Organizations use networks consisting of nodes connected by links to share device capabilities and information and to allow users to communicate and exchange information. Each node may include networking equipment such as ports (also known as interfaces), which translate signals between the formats used by links and those used by nodes. Nodes may connect to multiple links, and each node has one port for each link to which it connects. A node may perform various functions; for example a node may run user applications and also act as a network management console. A node may be termed a host or a device, and may be a PC, workstation or laptop running a user application program, a router, an application gateway, a server, or any other device attached at some time to a network.
As network use and complexity increase, managing networks and diagnosing network problems becomes more difficult; particularly difficult problems are managing network loads and solving problems caused by excessive loads. The traffic load on a network may be unpredictable, may change over time, and may be unevenly distributed over the network. Links connecting network nodes, and other networking equipment such as routers or the equipment interfacing between nodes and links, have finite capacities. Increases in traffic may overload equipment, resulting in network performance degradation. Additional traffic which causes network slowdowns or breakdowns, which may be termed congestion traffic, may be created by small numbers of nodes and may affect only certain links and equipment. Congestion traffic may be any traffic which may be identified and, possibly, rerouted, halted or altered to alleviate an excess traffic condition on a network. Congestion traffic may be traffic which existed prior to an excess traffic condition, such as in the case where other additional traffic, when added to a network, causes an excess traffic condition. Congestion traffic may be that which is rerouted to a healthy link after an equipment failure (for example, damage to a link) or may be the traffic which existed on the healthy link prior to the failure; both sets of traffic cause an abnormal load on the healthy link. When used herein, a congestion condition may be any network state or condition which is caused by excess traffic on all or a portion of a network. For example, a congestion condition may be a condition where a port is receiving or being forced to send too many packetsxe2x80x94network performance may be degraded.
Network operators may add or reallocate capacity or alter systems to cure congestion conditions. Links may be upgraded or added. Traffic carrying capacity may be moved from one area of a network to another. Routing tables may be altered to reroute congestion traffic along less used paths. Traffic may be divided into classes with different priorities in an effort to provide better service to high priority applications.
Adding or reallocating network equipment or altering systems is expensive and consumes time and resources. Before doing so it is desirable to have accurate information including the source, destination, and type of congestion traffic. Overall, a knowledge of the distributed state of a networkxe2x80x94details on traffic flow, paths and typexe2x80x94is required to quickly and efficiently alleviate congestion conditions. Often equipment or systems are added or reallocated based only on an operator""s educated guess as to where the problem lies and how big the problem is. This results in inefficiencies. It is desirable, therefore, to have a system which collects detailed and useful data on congestion or other network conditions.
Certain information about the state of a network may only be gathered accurately and quickly at individual nodes distributed throughout a networkxe2x80x94for example, statistics on the source of traffic reaching a node or the functioning of node ports. Currently, gathering such information requires that an operator physically access individual nodes, e.g., by using a sniffer, or that a central console query remote nodes.
Systems exist for collecting information about network traffic. The distributed state of a network may be collected at a central management console which polls remote network nodes. The distributed state of a network may also be determined by physically accessing each node. For example, diagnosing a congestion condition may require determining the source of the greatest amount of traffic received at a node and the paths taken by such traffic. A path taken by traffic may be described as the equipment traversed by traffic as the traffic crosses a network or networks (e.g., a series of nodes and links, or a series of sub-networks).
An operator may access a node and analyze incoming traffic using a sniffer, a device recording network statistics. An operator may determine which of the physical links attached to a node is receiving a certain type of traffic, and which node is the source of that traffic. The paths of traffic from the source may be found by traversing the network from node to node, using the sniffer at each node, until the source node is reached. Such a diagnosis is slow and inaccurate. A similar analysis may be performed from a central console which queries remote nodes for information about the source of incoming traffic. This method is also slow and inaccurate, as it requires communication with nodes across the network. The speed at which congestion occurs and must be stopped makes such detection methods ineffective.
Existing methods for analyzing network traffic are slow, inefficient, and inaccurate. The time taken to perform such operations results in inaccuracy, as the state of a network is ascertained over a period of time. Inaccuracies and delays may also occur, if (as may happen during a congestion condition), data transmission over links is interrupted or halted. The state of a network is not always accurately viewed from one central point which has only indirect access to the state of remote network nodes.
Therefore there exists a need for a system and method allowing for the distributed state of a network, such as information about congestion traffic, to be quickly and accurately collected. A system and method are needed for quickly and accurately determining information about traffic between nodes, such as the path or paths of such traffic.
A method and system are disclosed for analyzing traffic on a network by monitoring network traffic and, when a particular network condition (for example, network congestion) is detected, gathering information about the traffic on the network by launching an agent and having the agent iteratively identify which of the links on the node on which the agent operates accepts a type or class of traffic, traverse the identified link to the node across the link, and repeat the process.