Efficient allocation of network resources, such as available network bandwidth, has become critical as enterprises increase reliance on distributed computing environments and wide area computer networks to accomplish critical tasks. Transport Control Protocol (TCP)/Internet Protocol (IP) protocol suite, which implements the world-wide data communications network environment called the Internet and is employed in many local area networks, omits any explicit supervisory function over the rate of data transport over the various devices that comprise the network. While there are certain perceived advantages, this characteristic has the consequence of juxtaposing very high-speed packets and very low-speed packets in potential conflict and produces certain inefficiencies. Certain loading conditions degrade performance of networked applications and can even cause instabilities which could lead to overloads that could stop data transfer temporarily.
Bandwidth management in TCP/IP networks to allocate available bandwidth from a single logical link to network flows is accomplished by a combination of TCP end systems and routers which queue packets and discard packets when some congestion threshold is exceeded. The discarded and therefore unacknowledged packet serves as a feedback mechanism to the TCP transmitter. Routers support various queuing options to provide for some level of bandwidth management including some partitioning and prioritizing of separate traffic classes. However, configuring these queuing options with any precision or without side effects is in fact very difficult, and in some cases, not possible.
Bandwidth management devices allow for explicit data rate control for flows associated with a particular traffic classification. For example, bandwidth management devices allow network administrators to specify policies operative to control and/or prioritize the bandwidth allocated to individual data flows according to traffic classifications. In addition, certain bandwidth management devices, as well as certain routers, allow network administrators to specify aggregate bandwidth utilization controls to divide available bandwidth into partitions to ensure a minimum bandwidth and/or cap bandwidth as to a particular class of traffic. After identification of a traffic type corresponding to a data flow, a bandwidth management device associates and subsequently applies bandwidth utilization controls (e.g., a policy or partition) to the data flow corresponding to the identified traffic classification or type.
More generally, in-depth understanding of a packet traffic flow's profile is a challenging task but nevertheless is a requirement for many Internet Service Providers (ISP). Deep Packet Inspection (DPI) may be used to perform such profiling to allow ISPs to apply different charging policies, perform traffic shaping, and offer different quality of service (QoS) guarantees to selected users or applications. However, DPI has a number of disadvantages including being a slow procedure, resource consuming, and unable to recognize types of traffic in which there is no signature set. Many critical network services may rely on the inspection of packet payload content, but there can be use cases when only looking at the structured information found in packet headers is feasible.
Traffic classification systems may include a training phase and a testing phase during which traffic is actually classified based on the information acquired in the training phase. FIG. 1 is diagram of a training operation to create multiple packet traffic flow models. The input of the training phase includes known packet traffic flows, and the output includes multiple packet traffic flow models. Packet traffic flow descriptors like average payload size, etc. (described in more detail below) are determined from the known packet traffic flows and used to generate clusters which are used to create the multiple packet traffic flow models. The models are stored for later use to profile unknown packet traffic flows.
FIG. 2 is diagram of packet traffic flow profiling using multiple packet traffic flow models created in FIG. 1. Unknown packet traffic flows are received and processed to determine multiple flow descriptors (in a similar way as in the training phase) with a particular accuracy and confidence level. The multiple packet traffic flow models created in the training phase are loaded and tested on the input data, and the one of them is selected to profile a particular one of the unknown traffic flows.
Unfortunately, in existing packet header-based traffic classification systems, the effects of network environment changes and the characteristic features of specific communications protocols are not identified and then considered together. But because each change and characteristic feature affects one or more of the other changes and characteristic features, the failure to consider them together along with respective interdependencies results in reduced accuracy when testing traffic a different network than was used the training phase was using.
Packet inspection methods typically either use supervised machine learning or unsupervised machine learning but do not use them together. One type of machine learning may perform well on one particular network but perform less accurately on another network. However, the above-identified application describes an approach where both supervised machine learning and unsupervised machine learning are used together in order to classify traffic with improved accuracy and performance. The inventors recognized that unsupervised learning had certain advantages and disadvantages which differ from those associated with supervised learning and that even better accuracy and performance may be achieved by exploiting those advantages and minimize the disadvantages in the creation and use of traffic profiling models.