1. Field of the Invention
This invention relates to network communications and usage analysis, and particularly to methods and apparatus for gathering and processing netflow data. More specifically, the exemplary embodiments of the present invention relate to identifying network attacks using flow records.
2. Background of the Invention
Packetized data networks are in widespread use transporting data throughout the world. Packetized data networks typically format data into packets for transmission between one computer and another. These packets include headers containing information relating to packet data and routing. The data networks send these packets from the originating computer to the destination computers using routers which send the packet on to the destination computer using the routing information in the packet header. A flow of packets are a group of packets sent from a particular source network address and port to a destination network address and port. These particular destination source network addresses and ports may, for example, correspond to different computers. As these networks have expanded the benefits of using them has increased. However this has opened opportunities for attacks on businesses using the networks.
One type of attack is a distributed denial of service. This involves a large number of compromised computers attacking specific computers and overwhelming them by opening huge numbers of network connections. Another type of attack is a port scanning attack. This involves a rogue computer opening connections over a range of network addresses and probing them for weaknesses.
Netflow is a known network protocol which may be used for collecting and monitoring IP traffic. Some netflow analyzing engines keep only the top number (normally up to 1000) of source, destination IP addresses based solely on the volume of data associated with the IP address. The disadvantage of filtering the information in this manner is that the actual flow information is lost, in particular the context of the resulting information e.g. the source and destination ports associated with the source and destination IP addresses. Hence, such engines are unable to identify attacks as all distribution information is lost.
Other netflow analyzing engines retain only a subset of the flows (normally around 10,000 flows in a time period of one hour). The subset is normally based on the flow octet size. This technique reduces the storage required for flows whilst, in contrast to the technique outlined above, still retaining some distribution information. However, since the subset is normally based on flow octet size, in particular the highest values thereof and denial of service and port scanning attacks can contain small amounts of octets, flows associated with the attack do not appear in the subset and so the attack would again not be visible to the engine.
In order to be able to identify the patterns of the attacks described here, it is necessary to analyze all the flows, especially flows with only a small amount of octets and packets. Such analysis is generally not undertaken due to the inefficiency thereof, i.e. it is deemed unfeasible to effectively analyze such a large volume of data as this would not only lead to issues regarding storage capacity, but also require unrealistic processing times in order to identify attacks.
It is an object of the present invention to overcome these problems by providing a technique for efficient processing and analysis of netflow data and which enables, in particular, the identification of distributed denial of service attacks and port scanning attacks based on analysis of a full set of flow information.