1) Field of the Invention
The invention is related to a method for sample-analysis of data comprising a multitude of data packets showing the steps according to the body of claim 1 and claim 2 respectively. Furthermore, the invention is related to a monitoring system for sample-analysis of data comprising a multitude of data packets showing the features according to the body of claim 14.
2) Description of the Prior Art
In order to monitor data, in particular data streams (traffic) in Internet Protocol (IP) networks, a number of methods and devices are known in the art. An increasing data rate of the data stream necessarily leads to higher computing power required to analyze the single data packets of the stream. Therefore, a number of methods propose not to analyze each data packet but to apply sampling to a data stream for the measurements in IP networks.
Such a sampling method has been described for instance by Nick Duffield et al. in “Properties and Prediction of Flow Statistics from Sampled Packet Streams”, ACM SIGCOMM Internet Measurement Workshop 2002, Marseille, France, Nov. 6-8, 2002. Calculation rules for probabilistic sampling and for estimating the overall volume are introduced. Nevertheless, the problem in this situation is that the sampling method is applied on the traffic mix (e.g. all traffic that is observed on an interface), but the estimation accuracy is required per ‘flow’. Furthermore, the results for volume measurements of the aforementioned article apply to probabilistic and not to so-called n-out-of-N sampling.
Several definitions of the term ‘flow’ exist. Within the scope of the instant invention a definition following the IETF (Internet Engineering Task Force) RfC 3917 (Request for Comments) is relevant:
Data packets belonging to a particular flow have a set of common properties. Each property is defined as the result of applying a function to at least on of the values of:                1. one or more packet header fields (e.g., protocol, destination address, source address, Type of Service (TOS) in IP data packets), transport header field (e.g., destination port number, source port number), or application header field (e.g., Real-time Transfer Protocol (RTP) header fields);        2. one or more characteristics of the packet itself (e.g., number of Multi Protocol Label Switching (MPLS) labels, etc.) and        3. one or more of fields derived from packet treatment (e.g., next hop address, the output interface, etc.)        
A packet is defined to belong to a particular flow if it completely satisfies all the defined properties of the flow.
For the simplest case the function being applied to the values of the properties given above is a multiplication with the factor one. However, every kind of mathematical function being suitable for the values of the given properties may be applied instead.
This definition of a flow covers the range from a flow containing all packets observed for example at a network interface to a flow consisting of just a single packet between two applications.
It is emphasized that the flow definition does not necessarily match a general application-level end-to-end stream. However, an application may derive properties of application-level streams by processing measured flow data. Furthermore, although packet properties may depend on application headers, no requirement related to application headers is defined.
In the following it is described how further important terminology is to be understood within the scope of the instant invention.
Observation Point:
An observation point is a location in a network where data packets can be observed. Examples are a line to which a probe is attached, a shared medium in particular such as an Ethernet-based LAN (Local Area Network), a single port of a router, or a set of interfaces (physical or logical) of a router.
It is emphasized that one observation point may be a superset of several other observation points. For example one observation point can be an entire line card. This would be the superset of the individual observation points at the line card's interfaces.
Flow Record:
A flow record contains information about a specific flow that was metered at an observation point. A flow record contains measured properties of the flow (e.g., the total number of bytes of all packets of the flow) and usually characteristic properties of the flow (e.g., source, in particular IP, address).
Metering:
The metering process generates flow records. Input to the process are packet headers observed at an observation point and packet treatment at the observation point, for example the selected output interface. The metering process consists of a set of functions that comprise packet header capturing, time stamping, sampling, classifying, and maintaining flow records.
The maintenance of flow records may include creating new records, updating existing ones, computing flow statistics, deriving further flow properties, detecting flow termination, passing flow records to the exporting process, and deleting flow records.
In the following a number of exemplary applications are described for the benefit of which information on different flows of the data is necessary.
Traffic Profiling:
Traffic profiling is the process of characterizing flows in particular in IP-networks by using a model that represents key parameters of the flows such as flow duration, volume, time, and burstiness. It is a prerequisite for network planning, network dimensioning and other activities. It depends heavily on the particular traffic profiling objective(s), which statistics, and which accuracy are required from the measurements. Typical information needed for traffic profiling is the distribution of used services and protocols in the network, the amount of packets of a specific type and specific flow profiles.
Since objectives for traffic profiling can vary, this application requires a high flexibility of the measurement infrastructure, especially regarding the options for measurement configuration and packet classification.
Traffic Engineering:
Traffic Engineering (TE) comprises methods for measurement, modeling, characterization and control of a network. The goal of TE is the optimization of network resource utilization and traffic performance. Since control and administrative reaction to measurement results requires access to the involved network nodes, TE mechanisms and the required measurement function usually are performed within one administrative domain. Typical parameters required for TE are link utilization, load between specific network nodes, number, size and entry/exit points of the active flows and routing information.
Attack/Intrusion Detection:
Capturing flow information plays an important role for network security, both for detection of security violation, and for subsequent defense. In case of a Denial of Service (DOS) attack, flow monitoring can allow detection of unusual situations or suspicious flows. In a second step, flow analysis can be performed in order to gather information about the attacking flows, and for deriving a defense strategy.
Intrusion detection is a potentially more demanding application which would not only look at specific characteristics of flows, but may also use a stateful packet flow analysis for detecting specific, suspicious activities, or unusually frequent activities. Such activities may be characterized by specific communication patterns, detectable by characteristic sequences of certain packet types.
QoS Monitoring:
Quality of Service (QoS) monitoring is the passive measurement of quality parameters in particular for IP flows. In contrast to active measurements, passive measurements utilize the existing traffic in the network for QoS analysis. Since no test traffic is sent, passive measurements can only be applied in situations where the traffic of interest is already present in the network. One example application is the validation of QoS parameters negotiated in a service level specification. Note that passive/active measurement is also referred to as non-intrusive/intrusive measurement or as measurement of observed/synthetic traffic.
Passive measurements cannot provide the kind of controllable experiments that can be achieved with active measurements. On the other hand passive measurements do not suffer from undesired side effects caused by sending test traffic (e.g., additional load, potential differences in treatment of test traffic and real customer traffic).
QoS monitoring often requires the correlation of data from multiple observation points (e.g., for measuring one-way metrics). This requires proper clock synchronization of the involved metering processes. For some measurements, flow records and/or notifications on specific events at the different observation points must be correlated, for example the arrival of a certain packet. For this, the provisioning of post-processing functions (e.g., the generation of packet IDs) at the metering processes would be useful. Since QoS monitoring can lead to a huge amount of measurement result data, it highly benefits from mechanisms to reduce the measurement data, like aggregation of results and sampling.
On the market there are systems available (e.g. NetFlow of Cisco™ Systems) that in particular perform methods of sample-analysis of data comprising a multitude of data packets. In a first step a parent population number of data packets represented by a defined number of data packets from the data is provided. Simultaneously or afterwards a sample number is set, wherein then a sample group comprising the sample number of data packets is sampled from the parent population number of data packets. Usually, this is done by random sampling. The sample group of data packets is classified by classification rules into sample-flow-groups representing different flows. These classification rules divide the parent population number of data packets by at least one of the foregoing properties that may characterize specific flows. Each sample-flow-group then consists of a sample-flow-quantity of data packets. Finally, sample-flow-mean-sizes defined by the mean data size of the data packets in each sample-flow-group are determined.
The described first situation, showing a first step of sampling and after this the step of classification is not compelling.
A second situation is conceivable, showing a first step of classification of all data packets of the data by classification rules into at least one parent population flow group. The quantity of generated parent population flow groups depend on the parameters of the classification rules as described above. Each of these parent population flow groups represent a particular parent population number of data packets belonging to a particular flow. Simultaneously or afterwards, for each parent population number of data packets a particular sample number is set. Using the respective sampling number, from each parent population number of data packets a sample-flow-group of data packets is sampled. Each sample-flow-group consists of a sample-flow-quantity of data packets that is equal to the respective sample number. Finally, sample-flow-mean-sizes defined by the mean data size of the data packets in each sample-flow-group are determined.
The outcome of the step of classification for the first situation and the step of sampling for the second situation is one entry per flow in the so-called flow cache. Furthermore, this entry is registered in the corresponding flow record. This entry comprises the value of the fields used to distinguish different flows (flowspec) and the total number of packets and bytes for the sampled flow. The values known to the user are only those available after the classification process. That means only the summary per flow (number of packets and bytes) is known. The individual characteristics per packet (bytes xi per packet i) are not known any longer at this stage. As n-out-of-N sampling is applied to the data that are in particular represented by a traffic mix in a data network, the data of the generated sample-flow-groups may be used to estimate the volumes of the flow groups in the parent population number of data packets (parent population flow groups).
Nevertheless, it is desirable that the user gets as much information as possible on the properties of the one or of the plurality of flow groups in the parent population flow groups.