Many data processing systems require a content type of data to be determined before the data can be further processed. For example, in malicious content detection systems, such as anti-virus systems and anti-spam systems, a received data generally needs to be classified before it can be scanned for malicious content. Intrusion detection/prevention systems, application-based traffic shaping devices or load balancers, IM proxies, and application accelerators may also require data to be classified. If the data is classified to be a skype data, then a content detection module may apply a set of algorithms to scan the data for malicious content. On the other hand, if the data is classified to be a bittorrent data, then the content detection module may apply a different set of algorithms to scan the data for malicious content. As such, determining content type of data is an important step before the data is scanned.
Existing systems determine content type by using port number of a port at which data is transmitted. For example, well-known port for HTTP protocol is “80,” well-known port for SMTP protocol is “25,” and well-known port for POP3 protocol is “110.” In such systems, data belonging to a certain type is transmitted to a dedicated port. As such, by determining the port number of the port at which data is transmitted, and knowing the content type that is associated with the port number, a system can determine the content type for the data. However, use of a port to transmit only one type of data is restrictive. Sometimes, it may be desirable to allow a port to transmit more than one type of data. Existing systems do not allow a content type to be determined if data is transmitted through a port that is not data type specific (i.e., port that is allowed to transmit more than one type of data).
Also, some type of data, such as IM data and P2P data, may not go to any specific port, and can be transmitted through different ports. In such cases, existing systems may not be able to classify IM data and P2P data using port number.