The trend in modern telecommunications is towards a common transportation architecture, one that will carry traffic from many diverse sources. As bandwidths become greater with concomitant increases in signaling and data transport rates, checking transported traffic to attempt to verify that it will not harm the network becomes an ever-increasing challenge. There is therefore a need for new methods and apparatus that can be used to screen traffic at its present and growing data rates to identify and cull those messages that present a threat to the integrity of the network.
One of the first things that must be accomplished in assessing the potential harm that a particular piece of traffic might do is to classify the traffic as to type, i.e., determine if it is voice, data, video, executable code, document, or other genre. In addition to enabling the assessment of potentially harmful messages, such data classification is also useful in prioritizing message transmission.
Effective message classification requires an efficient and accurate processing method and system capable of analyzing bit stream extracts from a data channel.
There have been numerous approaches to attempt to automatically classify digital traffic. Most of these attempts have used packet characteristics and arrival times, where, in a communications network such as the internet, portions of messages are transmitted as discrete packets and intermixed in the communications channel with packets from unrelated messages, each packet ultimately directed to its proper destination and reassembled with the other packets that form a complete message.
For example, in US Patent Application 2003/0108042, Jun. 12, 2003, “Characterizing Network Traffic from Packet Parameters,” Skillicorn et al. recite a method that maps the headers of new packets into a low-dimensional memory space and compares them with a subset of headers from previously classified traffic mappings. One advantage claimed is that the classification may be carried out in the transport control protocol layer, which has the merits of speed and reduced complexity.
In U.S. Pat. No. 6,597,660, Jul. 22, 2003, “Method for Real-Time Traffic Analysis on Packet Networks,” Rueda et al. disclose a method and implementing architecture for characterizing, predicting, and classifying packet network traffic using time scale analysis of packet arrival times. Packet arrival time is the only parameter used to classify traffic.
In “Automatic model classification of measured Internet traffic,” by Yi Zeng and Thomas M. Chen, published in the 2002 IEEE Workshop on IP Operations and Management, pp. 197-201, the authors propose a method based on the Hurst parameter (a parameter that estimates the long range dependence of a traffic stream) that uses traffic statistics to identify one of two traffic generation modules.
One published approach that attempts to identify traffic type through analysis of the actual content of a traffic channel is found in “Characterizing DSO-rate traffic using neural networks,” by Ben P. Yuhas and Charles M. Humphries, published in the Conference Record of the 1992 Global Telecommunications Conference (GLOBECOM'92), pp. 1319-1323, Vol. 3. In their approach, the authors report on their attempt to characterize traffic over a DSO line that is a 64 kilobit/second PCM channel used in the public switched network. Their technique includes the intermediate steps of computing a short-term Hamming-windowed Fourier power spectral density from data that was formatted as multiple bit samples of pre-selected quantized voice.
While individual data analysis algorithms may be able to classify data type with an accuracy that is greater than random chance, their accuracy thus far is significantly less than practical. Hence the need exists for an approach that provides more reliable information about a data message's content while still being reasonably quick to process.