Pattern matching is useful in inspecting and classifying packets sent over a network. For example, a network firewall or an intrusion detection system may inspect packets for computer virus patterns and appropriately filter such packets. A server load balancer may compare text within the packets to a list of Universal Resource Locator (URL) addresses, then classify and redirect the packets to specific servers based on the URLs. Such classification of packets requires comparison of the packets with a set of predefined patterns (e.g., computer virus patterns or URLs).
One method of inspecting the packets is the “brute force” approach, which compares the packets with an entire set of predefined patterns. This method is easy to implement, but requires a computation time that increases in proportion to the amount of incoming data and the number of patterns. This method is often used when only the header portions of the packets need to be inspected or when the transmission rate of the packets is low.
Another method of inspecting packets is to use a hash table to reduce the number of comparisons required. A hash table is constructed by applying a hash function to the predefined byte patterns to generate “keys,” which are used as indices in the hash table. A key may correspond to several byte patterns. For example, if there are 1000 byte patterns to be compared, a hash function may map the 1000 byte patterns to 100 keys, each key corresponding to about 10 byte patterns. To compare a text string with the 1000 byte patterns, the hash function is applied to the text string to generate a key value. This key value is compared with the 100 keys in the hash table. If no match is found, then the text string will not match any of the 1000 byte patterns. If a match is found (i.e., the text string “hashes” into the hash table), then the text string is compared with the 10 or so byte patterns that correspond to the matching key to see if the text string matches any of the byte patterns. A condition in which two or more text strings hash to the same key value is called a “collision”.