Network management is a loosely defined field covering areas such as performance management, configuration management, fault management, security management, accounting, and others. Because large IP networks are difficult to manage, network management tools have been used. Network management tools or platforms generally collect and provide information about the current, recent, or historical status of a network, either for presentation to operators or for allowing applications to generate network control operations. Consider the following issues related to network management and tools for network management.
Many autonomous or enterprise IP networks are large, complex, and dynamic, making them difficult to manage. Network management tasks such as monitoring traffic in a network, analyzing the network's performance, or reconfiguring the network for improved performance require information about the network. However, because large IP networks are highly dynamic, it is difficult to acquire information useful for many network management tasks. Consider that a large IP network may have tens of thousands of nodes and hundreds of routers and gateways. A large corporate network may have 300,000 nodes and 2,500 routers. Routers, gateways, switches, and other network devices sometimes fail, go offline, or return to service. Links often fail, return to service, or degrade in performance. For instance, a microwave or satellite link may experience interference that reduces its bandwidth. Protocols such as OSPF and BGP that are used to route traffic in large IP networks are dynamic and change the routing paths in a large network as conditions change in the network. Even relatively stable networks can take a long time to reach a state of routing convergence. By design, the path of communication between two computers on an IP network can change even during the period of a single connection between them. In view of these factors and others discussed below, it has been difficult for network management tools to obtain information that over time paints a somewhat complete and accurate picture of a network.
Another problem with network management has been cost. Network complexity makes managing networks expensive as it has required manual intervention by skilled human operators. Configuration and management of a large IP network has been difficult to automate. This necessity for close human oversight has led many operators to adopt a conservative policy of preferring network stability over frequent reconfiguration to optimize network performance. Thus, another problem in the field of network management has been that IP networks retain suboptimal network configurations for longer than required, leading to inefficient use of expensive bandwidth capacity and potentially higher communication latencies than otherwise possible. Tools for automated management and configuration have not been widely adopted.
Although tools for network management do exist, they are unsophisticated and have many shortcomings. Most network management tools simply discover and poll live network devices to generate reports containing maps, counter values, averages, areas of high traffic, and so on. Current tools tend to ignore the global dynamics of network behavior, concentrating on centrally unifying potentially conflicting data taken locally from individual network devices. Current tools do not make it easy for an operator to perform a variety of potentially useful tasks such as discovering the path a particular set of traffic takes through the network, investigating the behavior of the network in ‘whatif’ scenarios, monitoring the evolution of the network as failures and recoveries occur, or analyzing network traffic as it relates to particular applications or services, and so on.
For example, consider a company's IT manager who has been asked to consolidate the company's email servers at a single site. No tools exist to help the manager work out the impact on the network and identify any reconfiguration that may be necessary due to the probable change in traffic patterns. There is no information that tells the manager about network traffic for email in view of the topology of the network. Most likely the IT manager would have to build ad hoc simulations of the company's network using generic traffic and failure distributions, possibly estimating parameters from measurement samples if they were available.
There have been attempts to measure network traffic at individual user computers, but host traffic data has been limited in scope and generally cannot reveal information related to traffic flow along particular paths in an IP network. Host or end-system network measurement does not provide useful information about network topology. There are also tools that aggregate IP traffic data at network devices such routers and switches. For example, NetFlow from Cisco Systems. However, these approaches have proven inadequate for numerous reasons such as opaque (e.g., encrypted, tunneled) traffic, complex application communication patterns, sampling artifacts, load on routers introduced by monitoring, and others.
Network management tools have related to two main areas, among others. First, tools have been used for the definition and handling of management information for use by network management applications. This involves appropriate collection and presentation of data: filtering, storage, liveness, and so on, sometimes using standardized MIBs (management information bases, which are database tables) for TCP/IP. Internet MIBs store information such as the IP addresses a router has observed as active, per-port byte and packet counts, and general configuration information. Traps might be set to notify a listening management system that a particular counter had peaked above, or was averaging more than, a set limit.
Second, network management tools have been used for the design of automated or adaptive management systems, which utilize the data stored and presented by the MIBs to control the system. Examples include the use of forward and backward inference for prediction and diagnosis in ATM networks, declarative logic rules applied to an object oriented database network model, and the combination of declarative logic with active and temporal databases. Current IP network management products make extensive use of device MIBs, using ICMP Echo (ping) for initial device discovery, and then SNMP (Simple Network Management Protocol) to get/set MIB entries and to allow devices to asynchronously trigger actions in listening management systems via traps. Cisco routers also support NetFlow, a built-in sampling system able to present data to management systems concerning the current traffic at a router.
Unfortunately, none of these management tools or systems are satisfactory. They require extensive and correct MIB support in the tool and on the devices managed thereby. Existing tools tend not to scale well, generating large volumes of data where core network solutions such as NetFlow are deployed. They typically cannot provide an accurate, detailed view of network behavior, due to the significant CPU and network load that frequent SNMP polling generates. NetFlow also suffers from the limitation that it uses sampling techniques to monitor traffic, giving rise to sampling artifacts and limiting its maximum accuracy in the face of short lived traffic patterns. Even where these types of tools are successfully deployed, they do not address some of the fundamental problems related to the dynamic behavior of networks.