The IPsec (IP security) ESP (Encapsulating Security Payload) is currently the IETF (Internet Engineering Task Force) security protocol for the encryption of IP datagrams. IPsec implementation is mandated for every IPv6 (Internet Protocol Version 6) node.
As described by S. Kent, R. Atkinson in “IP Encapsulating Security Payload”, Network Working Group, RFC 2406, November 1998, when ESP is used the ESP header must be placed after the IP header, and the whole payload after the ESP header is encrypted. This allows a good level of confidentiality since, except from the IP header, an eavesdropper cannot access any other data such as transport protocol header, payload, etc.
In IP networks, intermediate entities, e.g. firewalls, performance enhancing proxies, etc., are being developed and deployed to add security or increase performances. These network entities typically need to have access to some specific fields located after the IP header in order to perform the functions they are designed for. For example, firewalls need to filter packets based on the TCP/UDP port numbers, and need to read the content of SDP (Session Description Protocol) fields in SIP (Session Initiation Protocol) signaling to open pinholes for SIP communications. For more details regarding SDP and SIP it is referred to M. Handley, V. Jacobson: “Session Description Protocol”, Network Working Group, RFC 2327, April 1998, and J. Rosenberg et al.: “Session Initiation Protocol”, Network Working Group, RFC 3261, June 2002.
The presence of IPsec ESP and the encryption of the above-mentioned fields prevent these intermediate nodes from accessing the necessary information. Usually, the intermediate node needs to sniff the packet payload in clear text before applying any optimization. Furthermore, firewalls are typically configured to drop packets according to policy rules, or when packets cannot be inspected successfully. The presence of IPsec ESP will therefore result in packets being dropped, or in a loss of performance optimization, with a considerable impact on cellular links.
In “Transport-Friendly ESP”, December 1998, Steven M. Bellovin proposes an ESP format that will allow to leave the first part of the payload in clear text, and only the last part of the payload will be encrypted. This proposal may be useful for an intermediate node that needs to obtain the TCP/UDP (Transport Control Protocol/User Datagram Protocol) port numbers. However for other intermediate nodes such as SIP-aware-firewalls that need to have access to the SDP fields to open the appropriate pinholes for the media stream, this solution means that all the data until (including) the SDP will have to stay in clear text. However, this is a condition that may be not acceptable: the sender of the SIP message will most probably want to hide its identity which is carried in the SIP header fields that precede the SDP in the message.
In “A Multi-Layer IPsec Protocol”, Y. Zhang, B. Singh, Proc. of the 9th USENIX Security Symposium, Denver, Colo., USA, August, 2000, a protocol is proposed that offers more flexibility than the Transport-Friendly ESP method by allowing to have a sequence of encrypted—clear text—encrypted data, for example. Basically, the protocol divides an IP datagram into several parts (or zones) and applies different forms of protection to different zones. However, the negotiation of the zone is not defined. According to the prototype, the zones were manually configured. However, such manual, i.e. fixed zone definition will not allow this method to solve the above-described problem of accessing necessary information.
In addition to the fact that the above-mentioned approaches do not solve the addressed problem, these two protocols do not allow any other signaling that could be used for intermediate nodes to perform optimizations.