Typically, due to the connectionless nature of Internet Protocol (IP) networks, and for purposes of network scaling, IP routers do not maintain per-flow states of traffic flows. The real-time monitoring of traffic flows, however, is required for many network management functions, such as network traffic planning and management, network monitoring, and network security. In general, most network management functions and, therefore, most existing flow estimation techniques, require flow definitions using packet header information for determining flow membership, however, some network management functions (e.g., network security) require flow definitions using packet payload information for determining flow membership. For example, detection of potential virus signatures may require processing of both packet header information and packet payload information.
In one example, logging an anonymous File Transfer Protocol (FTP) attempt on a server in an enterprise network may require the following matches: (1) source address field (i.e., match any source address outside the enterprise network); (2) destination address field (i.e., match any destination address inside enterprise network); (3) protocol (i.e., TCP); (4) destination port (i.e., port 21 (FTP)); (5) payload (i.e., contains string “ftp” or “anonymous”). While the match for the first four fields is performed using header-based flow identification algorithms, the match for the fifth field is performed using a payload-based flow identification algorithm, which are generally more difficult and, therefore, expensive, to implement than header-based flow identification algorithms.
In general, payload-based flow identification requires difficult string-matching algorithms. Furthermore, payload-based flow identification is further complicated by: (1) for security related monitoring, the starting point of a suspicious pattern within the payload is generally not known (e.g., the signature of a virus or worm may appear anywhere in the payload); and (2) for security related monitoring, the suspicious patterns are not known (e.g., if a new, unknown worm is beginning to propagate). Disadvantageously, existing payload-based monitoring techniques lack efficiency in such situations in which either the starting point of a suspicious pattern, or even the suspicious pattern, is unknown.