Packet-based computer networks transmit information in packets that include header information for routing the packets and packet contents. Packet headers are formatted with a sequence of well known header fields that direct the packets through the network. For instance, network computing devices perform routing and switching functions with packet field values based upon computations using header field values. Routers are an example of a network computing device that rapidly directs packets with computations based on the packet's destination address in the packet IP header field. Modern routers rapidly compute an output interface by performing hard-wired fixed functions rather than relying on slower software functions. Although hard-wired functions perform at greater speeds than software functions, hard-wired functions lack the flexibility of software functions and are difficult to modify or change.
Following the packet header fields are the contents of the packet. The header fields indicate the type of content. For instance, an EtherType field indicates that a packet contains an IP datagram and the packet field values for an IP datagram allow determination of the type of data, such as TCP, UDP, RTP, etc. Typically, the packet contents are not referenced for routing or switching operations through packet based networks since functions performed on packet contents would slow packet transfers through the network. For instance, the Ethernet, IP and TCP layers, known as layers 2, 3 and 4 respectively are used but deeper layers are not. However, packet header fields do sometimes indicate that the contents of a flow of packets are related by providing ordered sequencing information that relates packets within a network flow. For instance, one type of network packet flow is a TCP stream which includes header fields indicating the TCP sequence order of packets. With network packet flows having an ordered sequence of packets identified with sequencing information in the packet header field, the destination device is able to reassemble the contents and determine if packets are missing from the stream so that the missing packets may be resent.
In order to provide services to packet based networks, network processors have been developed to include programmable functions for classifying, modifying and shaping packets at network line speeds. These network processors include specialized hardware to provide rapid processing of packet header field information in a programmable manner so that packet based networks may provide services without substantial impact on data transfer rates. However, classification of packets based on deep layers of header fields and actual packet content presents a difficult problem since in-depth review of packet contents requires greater processing and tends to slow packet transfer rates through the network.
To address content based classification, application specific processors are available to aid network processor functionality. For instance, Raquia Networks, Inc. sells classification co-processors that classify packets using regular expressions and subexpressions for packet content payload. By supporting network processor functionality with function specific hardware that classifies based on content, content classification of packets is possible at line speeds. However, integration of content classification into a packet based network remains a complex problem, particularly when processing streams of packet content.
One difficulty with classifying packets by content is that packet content typically spans more than one packet of a packet flow. Thus, packet content searches that span only a single packet may miss desired content sent in two or more different packets of a stream. Further, packets of a stream are sometimes sent out of order so that content classification cannot be completed absent the missing packet or packets.
Another difficulty with content classification is that the process of searching packet contents risks slowing network traffic to an unacceptable level. This problem becomes particularly acute where the content search is complex, involving multiple expressions and subexpressions. Further, to the extent that current systems are able to classify based on content, such systems lack scalability. For instance, the systems available from Raqia Networks, Inc. are able to search for expressions numbered in the thousands but millions of expressions are required to effectively classify on content, such as for blocking access to pornography sites.