The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
In computer networks such as the Internet, packets of data are sent from a source to a destination via a network of elements including links (communication paths such as telephone or optical lines) and nodes (for example, routers directing the packet along one or more of a plurality of links connected to it) according to one of various routing protocols. Elements in the network are typically identifiable by a unique internet protocol (IP) address.
One routing protocol used, for example, in the internet is Border Gateway Protocol (BGP). BGP is used to route data between autonomous systems (AS) comprising networks under a common administrator and sharing a common routing policy. BGP routers exchange full routing information during a connection session for example using Transmission Control Protocol (TCP) allowing inter-autonomous system routing. The information exchanged includes various attributes including a next-hop attribute. For example where a BGP router advertises a connection to a network, for example in a form of an IP address prefix, the next-hop attribute comprises the IP address used to reach the BGP router.
Within each AS the routing protocol typically comprises an interior gateway protocol (IGP) for example a link state protocol such as open shortest path first (OSPF) or intermediate system-intermediate system (IS-IS).
Where the network carries different types of traffic, for example email or video traffic, this may be handled by separate processes or ports on network components.
It is desirable in many instances to monitor the flow of network traffic for various purposes such as security and billing. The information derived can be used to identify, for example, “top talkers”, that is, the noisiest protocol or most prolific addresses used. The information can be employed, for example, for network profiling, traffic analysis or for security purposes such as attack mitigation.
One way of monitoring the flow of network traffic is to categorize data packets forming the traffic as one of a plurality of “flows”. According to this approach packets with common characteristics or key fields are grouped together as a flow. One example of such an approach is the NetFlow™ product which is a feature of Cisco IOS® software available from Cisco Systems, Inc, San Jose, Calif., USA. According to this approach, packets sharing a common set of key fields, defined as source and destination IP address, source and destination port, protocol, Type of Service (ToS) and input interface are classified as a single flow within a router through which the packets pass. By comparing such flows, information such as the flow having the largest number of packets or the largest number of bytes can be identified. In some instances not all packets are processed, but are randomly sampled, where a full view of all packets is not required.
However it would be desirable to derive yet further information from the flow profile created. For example the flows are categorized in too much detail to identify a particular source, destination or protocol which is consuming network bandwidth. In the case of attack mitigation in a Denial of Service (DoS) attack, an attacker sending many small flows from a multitude of spoofed source IP addresses may never show as a “top talker” because each separate flow only consists of a few packets and is short-lived.
According to existing flow monitoring schemes, flows are cached at the router allowing the relevant information to be derived from them. For example referring to FIGS. 1A, 1B and 1C which are schematic diagrams showing packets and classification of packets into flows to form a flow profile, a packet 10 is shown having a header 12 and a payload 14 (not shown to scale in terms of number of bits). The header 12 includes various fields including source IP address 16, destination IP address 18, ToS 21 and protocol 24.
Referring to FIGS. 1B and 1C, four packets 30, 32, 34, 36 are classified into two flows 38, 40. The first flow, 38 comprises two packets, 30, 34 with common source IP address SA1, destination IP address DA1, ToS “X” HTTP protocol, source and respective payloads of size 8 bytes and 10 bytes. As a result the size of the first flow 30 is recorded as two fields containing a count of the number of packets “FLOW COUNT” 25 and a count of the number of bytes “SIZE” 27, in this case: two packets and 18 bytes. A second flow 40 comprises packets 34, 36 having source IP address SA1, destination IP address DA2, ToS “X”, HTTP protocol, respective payloads of size 15 bytes, 7 bytes. As a result the flow 40 is of size two packets and 22 bytes.
It will be seen that caching of flows requires significant storage requirements as a result of which the cached flows are periodically exported to a remote node termed a “collector” node. Export can take place upon various criteria being fulfilled. For example if a flow is continuing then cached entries for the flow can be exported upon expiry of a timer. If a flow is dormant for a predetermined period or terminated (for example the TCP connection is terminated), again the entries can be exported to a collector. At this time the exported flows can be aggregated according to one of various schemes in existing systems. For example flows can be grouped together with common source and destination AS and interface, the aggregation scheme further containing a record of the number of packets, number of flows, number of bytes and time stamp of first and last packets in the aggregation. Other schemes have been adopted including prefix aggregations, port or protocol aggregations or type of service (ToS) aggregations.
Although identification of flows and aggregated flows can be used to derive useful network information, the information collected does not allow analysis of certain complex message transactions for example involving multiple transactions between first and second network locations in both directions, such as connection sessions.