Due to the widespread development of new protocols, the knowledge of application level protocols is becoming important for network security reasons. However, many of the applications being developed are closed-source, of which little or no information is available about the protocols used by the applications. Message format reverse engineering, particularly as part of protocol reverse engineering, can be used in such a scenario, to deduce the description of protocols used by the applications.
One of the uses of protocol description generated from protocol reverse engineering is in penetration testing of network applications. Penetration testing involves generating test inputs for applications and observing their behavior to identify attacks or bugs. Such testing is highly inefficient when test inputs are generated randomly. Instead, protocol knowledge allows the generation of inputs that explore the program's operations more thoroughly. Protocol description also aids in protocol fingerprinting, which aims to identify the protocol that a particular connection belongs to by content analysis. It also aids in encapsulation detection, where the goal is to identify when one protocol (e.g., P2P) is encapsulated over another (e.g., HTTP). Another use of protocol description is in building protocol analyzers that help make deep packet inspection more practical and usable.
Protocol reverse engineering includes two main steps: message format inference and protocol state machine inference. Generally, existing tools that implement protocol reverse engineering need painstaking manual analysis of network traces for message format inference. Even automatic protocol analysis tools rely mostly on bio-informatics techniques for message format inference. These techniques employ sequence alignment for aligning messages, which looks for exact byte matches and is thus suited for aligning messages with similar byte sequences rather than with similar formats.