Network security systems, often described as intrusion prevention systems (IPS) or intrusion detection systems (IDS) commonly employ both pattern matching, performed on a data stream represented by the packet payload, and the checking of headers to detect unwanted or undesirable digital signatures which may represent a security threat. Within the security rules used by such systems there are normally links between specific header values and the security thread content in the payload. For example, a particular signature that may be significant (for example, because it represents a potential threat) in one type of packet (e.g. a UDP packet) may not be important in another type (e.g. a TCP packet). When a pattern is detected but, having regard to its context, is not significant, it is generally termed a ‘false positive’. The production and elimination of false positives represent severe processing overhead in detection systems.
It is accordingly not only necessary to detect the signature but also to ‘post-process’ the header to check for the header value qualifiers which confirm the ‘authenticity’ of a potential violation of security. In many cases there are several header fields which must match a specific value in order to determine that a genuine positive match has been obtained.
A DFA, otherwise called deterministic finite automata, or deterministic finite state machine, as represented in graphical form, has a plurality of states each of which has an exit or transition dependent on an examination of the next ‘character’ or ‘byte’ in a string of characters that the DFA examines. In one practical example, each state of the DFA is represented by a group of locations in a memory. The action of examining an incoming character to determine what action, and in particular what transition if any is required is preferable in practice performed by adding a offset, particular to the state, to the current character to access one of the respective group of memory locations. The locations may contain at least one pointer which determine the next transition of the machine. In its simplest practical form, termed a single table machine, a DFA comprises, for each state, a multiplicity of locations showing the next state for each of the possible variations of an input character. Where, as is typical, an input character is a byte, a single table machine requires 256 locations, only one of which will identify a state other that the initial or default state. Thus the memory space required for a single table machine is in practice unmanageably large. A great reduction in the required memory space can be achieved by means of a dual table machine, wherein one table contains ‘default state’ and ‘offset’ information and the other contains ‘next state’ and ‘check state’ information, as will be described in more detail later.
The requirement for detection of significant patterns in payloads of data packets differs substantially from those relating to patterns in a header. A significant pattern may occur anywhere and may represent the same potential threat wherever it may occur in a payload and may extend across packet boundaries, for example being represented by a pattern which begins in the payload of one packet and ends in the payload of a subsequent packet. A DFA is well adapted for searching for such patterns.
As indicated above, the any given pattern (whether representing a threat or not) in the header of a packet varies in significance because headers are necessarily organised such that different fields have a meaning dependent not only on their content but also on their location, i.e. their offset from the start of the packet. For example, the header of a packet conforming to IPv6 (Internet Protocol, version six) has 40 bytes which comprise, in order, a byte identifying the version (in this case the binary equivalent of 6), a byte identifying a traffic class, two byte constituting a flow label, two bytes specifying the length of the payload, a ‘next header’ byte identifying the protocol (e.g. TCP or UDP), to which the contents (payload) of the packet will be delivered, a byte specifying the hop limit, a 16-byte source address and a 16-byte destination address. It follows that a given pattern of characters (i.e. bytes) requires a knowledge of the offset from the start of the packet for the determination of its significance.
Current methods for the detection of digital signatures in addressed packets separate the analyses of the payload and the header of a packet. Such a separation is inefficient and significantly increases the number of false positives detected by the system. Post processing also increases latency through the detection system. The main reason for the separation of the tasks of analysis of the payload and analysis of the header is due to the characteristics of a standard DFA graph. This does not support location-based searching. A standard DFA searches for all patterns in the DFA graph in a continuous stream but cannot stop searching for a pattern after, for example, a specific number of bytes. An ordinary graph necessarily includes return transitions from many states at least to a default state.