Network usage data is useful for many important business functions, such as subscriber billing, marketing & customer care, product development, network operations management, network and systems capacity planning, and security. Network usage data does not include the actual information exchanged in a communications session between parties, but rather includes numerous usage detail records, known as “flow records” containing one or more types of metadata (i.e., “data about data”). Known network flow records protocols include Netflow®, sFlow®, jFlow®, cFlow® or Netstream®. As used herein, a flow record is defined as a small unit of measure of unidirectional network usage by a stream of IP packets that share common source and destination parameters during a time interval.
The types of metadata included within each flow record vary based on the type of service and network involved and, in some cases, based on the particular network device providing the flow records. In general, a flow record provides detailed usage information about a particular event or communications connection between parties, such as the connection start time and stop time, source (or originator) of the data being transported, the destination or receiver of the data, and the amount of data transferred. A flow record summarizes usage information for very short periods of time (from milliseconds to seconds, occasionally minutes). Depending on the type of service and network involved, a flow record may also include information about the transfer protocol, the type of data transferred, the type of service (ToS) provided, etc. In telephony networks, the flow records that make up the usage information are referred to as call detail records (CDRs).
In network monitoring, the network flow records are collected, stored and analyzed to produce meaningful result. Network usage analysis systems process these flow records and generate reports or summarized data files that support various business functions. Network usage analysis systems provide information about how a network services are being used and by whom. Network usage analysis systems can also be used to identify (or predict) customer satisfaction-related issues, such as those caused by network congestion and network security abuse. In one example, network utilization and performance, as a function of subscriber usage behavior, may be monitored to track a user's experience, to forecast future network capacity, or to identify usage behavior indicative of network abuse, fraud and theft.
In computer security, an access control list (ACL) is a list of permissions attached to an object. More specifically, in networking, ACL refers to a list of rules detailing traffic filtering rules. ACLs can permit or deny traffic through a network device. Only routers and firewalls can have network ACLs. Access control lists can generally be configured to control both inbound and outbound traffic.
ACLs are one way to control network traffic by limiting user and device access to and from undesired addresses and/or ports. ACLs filter network traffic by controlling whether routed packets are forwarded or blocked, typically at a router interface, although other devices can filter packets. The router examines each packet to determine whether to forward or drop the packet, on the basis of the criteria specified within the access lists. An access control list criterion could be the source address of the traffic or the destination address of the traffic, the target port, or protocol, or some combination therein. Typically Internet Protocol (IP) addresses serve as identifiers of the source device on an IP-based network. Access control lists allow differentiated access based on this IP identifier within the network.
While ACLs service useful functions, establishing ACLs may be a laborious. In particular, ACLs are currently programmed manually. Furthermore, the selection of IP addresses to place on the ACLs may be arbitrary and unpredictable.
In particular, many autonomous or enterprise IP networks are large, complex, and dynamic, making them difficult to manage. Network management tasks such as monitoring traffic in a network, analyzing the network's performance, or reconfiguring the network for improved performance require information about the network. However, because large IP networks are highly dynamic, it is difficult to acquire information useful for many network management tasks. Consider that a large IP network may have tens of thousands of nodes and hundreds of routers and gateways. A large corporate network may have 300,000 nodes and 2,500 routers. Routers, gateways, switches, and other network devices sometimes fail, go offline, or return to service. Links often fail, return to service, or degrade in performance. For instance, a microwave or satellite link may experience interference that reduces its bandwidth. Protocols such as OSPF and BGP that are used to route traffic in large IP networks are dynamic and change the routing paths in a large network as conditions change in the network. Even relatively stable networks can take a long time to reach a state of routing convergence. By design, the path of communication between two computers on an IP network can change even during the period of a single connection between them. In view of these factors and others discussed below, it has been difficult for network management tools to obtain information that over time paints a somewhat complete and accurate picture of a network.
Network complexity makes managing networks expensive as it has required manual intervention by skilled human operators. Configuration and management of a large IP network has been difficult to automate. This necessity for close human oversight has led many operators to adopt a conservative policy of preferring network stability over frequent reconfiguration to optimize network performance. Thus, another problem in the field of network management has been that IP networks retain suboptimal network configurations for longer than required, leading to inefficient use of expensive bandwidth capacity and potentially higher communication latencies than otherwise possible. Tools for automated management and configuration have not been widely adopted.
Although tools for network management, including monitoring and maintaining ACLS, do exist, the tools are unsophisticated and have many shortcomings. Most network management tools simply discover and poll live network devices to generate reports containing maps, counter values, averages, areas of high traffic, and so on. Current tools tend to ignore the global dynamics of network behavior, concentrating on centrally unifying potentially conflicting data taken locally from individual network devices. Current tools do not make it easy for an operator to perform a variety of potentially useful tasks such as discovering the path a particular set of traffic takes through the network, investigating the behavior of the network in ‘whatif’ scenarios, monitoring the evolution of the network as failures and recoveries occur, or analyzing network traffic as it relates to particular applications or services, and so on.
As described above, there have been attempts to measure network traffic at individual user computers, but host traffic data has been limited in scope and generally cannot reveal information related to traffic flow along particular paths in an IP network. Host or end-system network measurement does not provide useful information about network topology. There are also tools that aggregate IP traffic data at network devices such routers and switches, for example, NetFlow® from Cisco Systems. However, these approaches have proven inadequate for numerous reasons such as opaque (e.g., encrypted, tunneled) traffic, complex application communication patterns, sampling artifacts, load on routers introduced by monitoring, and others.
Furthermore, known techniques for identifying viruses are limited. The known techniques generally look for secondary effects of the virus, such as monitoring network resource usage and identifying applications requesting an unnaturally large amount of the network resources. However, it may be difficult to differentiate between the virus and legitimate applications that require a large amount of network resources. Also, virus are becoming more intelligent to avoid detection. For example, a virus may sit dormant on a system for some time, waiting for a signal to initiate. For example, a malicious virus may sit dormant until confidential data is acquired. Thus, while the virus is waiting to act, it would be difficult to detect because it produces minimal side-effects.