The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for determining a sampling rate from randomly sampled events.
In order to obtain information from network switches for purposes of measuring data flow characteristics, various techniques and protocols have been devised that can generally be classified into counter based and sampling based techniques/protocols. With regard to counter based techniques, each port of a switch may have one or more counters associated with it that measures the number of bytes, packets, or the like, that are sent and dropped since the switch was rebooted or the counter was reset. These port counters, while maintained on the application specific integrated circuit (ASIC) of the switch, and thus operating at high speed, must provide the counter value data to the general purpose processor of the switch for processing, which is typically done approximately every second. Moreover, this technique monitors a fixed number of ports on the switch and thus, is limited in the granularity of information provided.
Another counter based technique is provided by the NetFlow network protocol developed by Cisco Systems. The NetFlow protocol collects IP and other traffic information using a cache of current data flows, typically specified by a 5-tuple comprising source address, destination address, source port, destination port, and protocol. That is, when a data packet is received from a particular data flow, a lookup in the cache structure is performed to determine if an entry exists for that particular data flow (i.e. a flow of data packets over an established connection between a source device and a destination device) and if so, one or more counter values in the entry are updated to reflect the presence of the data packet. If an entry does not exist, then a new entry is created in the cache and counter values incremented accordingly. When the cache becomes full, an entry in the cache is evicted to a collector for storage and/or processing. Alternatively, a timer-based eviction may be used that events a cache entry, at best, approximately every 30 seconds. Because the NetFlow protocol is cache based, there is no fixed number of data flows that may be monitored contrary to port counters.
Still another counter based technique is provided in the OpenFlow protocol flow counters. OpenFlow is a protocol specification promulgated by the Open Networking Foundation (ONF) a user-led organization dedicated to promotion and adoption of software-defined networking (SDN) and which manages the OpenFlow standard. OpenFlow allows the path of network packets through the network of switches to be determined by software running on one or more controllers which program the switches with forwarding rules. This separation of the control from the forwarding allows for more sophisticated traffic management than is typically feasible using access control lists (ACLs) and routing protocols. The OpenFlow flow counters give bytes/packets sent at user specified granularities, e.g., per 5 tuple data flow specification, per source ID, etc. These flow counters can typically be read no faster than approximately once per second.
With regard to sampling techniques, sFlow is an industry standard technology promulgated by the sFlow.org consortium. The sFlow standard provides technology for monitoring high speed switched networks. With the sFlow standard, statistical sampling is performed in which 1-in-N packets are sampled and forwarded to a collector which can analyze the samples and provide information about the state of the network including a list of data flows, the paths they are taking, their length, etc. However, because the samples must typically be forwarded to the control CPU of the switch, the sFlow technique is limited to a relatively small number of samples, e.g., approximately 300 samples per second. Moreover, the sampling rate must be set a priori and is stow to change. As a result, the sampling rate is forced to be set to a very low rate in order to accommodate high loads, e.g., 1 in approximately 400,000 packets are sampled to handle minimum-sized, e.g., 64 byte, packets at line rate on a 64 port, 10 Gbps switch while keeping the number of samples less than 300 per second.