It is known to provide a sizeable database of security threats, represented by patterns or signatures for which a security scanner in a unit connected to a network constantly searches in data streams received or monitored by the unit. It is convenient to store the database of signatures in a memory element, in the form of a table defining a deterministic finite state machine or automaton, usually termed DFA. The number of signatures for which a scanner can search is inherently limited by the size of the memory used to store these signatures. A DFA table is set up by means of a DFA compiler which in accordance with the signatures that are to be detected determines the state sequences and transitions that are to be used to detect those signatures. DFA algorithms for such compilers are known in the art.
In the present context, a ‘signature’ comprises a sequence of characters. In a typical example, a ‘character’ may be an ASCII character (of length one byte) and a typical length of a sequence of characters may be several hundred characters. Even so, one of the advantages of the use of a DFA is that the length of the signature does not matter; the operation of the DFA at any stage is dependent only on the current state and the next character.
To detect security threat signatures, particularly in the detection of network intrusion it is desirable to scan every character of every packet's payload to find regular signatures, to discard packets that match or contain a given signature, to generate an alert message to identify which signatures have been matched in a given set and to send an alert message to a log server when a match is detected. It is further desirable to be able to reconfigure the scanner so that it can detect new signatures. The quantity of signatures that require detection continually increases as more threats are identified. A DFA table which stores such signatures and defines transitions between states also needs to increase in size as the number of signatures is increased. As these signatures are kept in memory, the more signatures for which a search is made, the greater the size of the memory required.
It is customary to organize a DFA so that there is a possible next state transition from each of a multiplicity of states in a sequence to a state in at least one other sequence. This is inherently more efficient than a direct return to the default state for all but one character in a respective sequence. However, the occupancy of memory necessary to accommodate all the transitions is very extensive.