The Real-time Transport Protocol (RTP) is a protocol for delivering audio and video media data over a packet switched network. RTP is used for transporting real-time and streaming media data, such as interactive audio and video. It is therefore used in applications such is IPTV, conferencing, Voice over IP (VoIP).
Secure Real-time Transport Protocol (SRTP), specified in IETF RFC 3711, is a transport security protocol that provides a form of encrypted RTP. In addition to encryption, it provides message authentication and integrity, and replay protection, in unicast, multicast and broadcast applications. SRTP is used to protect content delivered between peers in an RTP session. SRTP is a transport security protocol and it is only intended to protect data during transport between two peers running SRTP. In particular, it does not protect data once it has been delivered to the endpoint of the SRTP session. In addition, the sending peer provides the protection by way of encryption of the media data, in other words it is assumed that the sending peer has knowledge of all keying material and is the one applying the protection of the data.
RTP is closely related to RTCP (RTP control protocol), which can be used to control the RTP session, and similarly SRTP has a sister protocol, called Secure RTCP (or SRTCP). SRTCP provides the same security-related features to RTCP as the ones provided by SRTP to RTP.
Utilization of SRTP or SRTCP is optional to utilization of RTP or RTCP; but even if SRTP/SRTCP are used, all provided features (such as encryption and authentication) are optional and can be separately enabled or disabled. The only exception is the message authentication/integrity feature, which is indispensably required when using SRTCP. Confidentiality protection in SRTP and SRTCP covers the payload, while integrity protection covers the payload and the full packet header.
Referring to FIG. 1, many content delivery systems and communication services include Store and Forward (SaF) mechanisms and require end-to-end confidentiality and integrity protection of media. In this scenario, a sender unit 1 sends media on a first hop between the sender unit 1 and an intermediate storage entity 2, and then (almost immediately or after some time) the media traverses a second hop from the storage entity 2 to the intended receiver unit 3. Note that the media may traverse more than one intermediate unit. However, each hop at an intermediate unit (such as a SaF Server) should be integrity protected. (The term “hop” is used herein to denote a logical link between two logically adjacent nodes in a network.) This is needed to allow an intermediate unit 2 to check the authenticity of media data packets arriving, for example where a mailbox or network answering machine stores media. This is necessary to protect against an attacker filling up the storage on the device with garbage. However, the keys necessary to decrypt the media or calculate/modify end-to-end (e2e) integrity protection should not be available to the intermediate unit, to prevent the intermediate unit from manipulating or having access to the plaintext media data.
It is possible for SRTP data to be sent via an intermediate unit, whilst ensuring that the intermediate unit does not have access to the data, by providing the media data with end-to-end protection between the sender unit 1 and the receiver unit 3.
A possible way to forward the media data from the intermediate unit 2 to the receiver unit 3 is by file transfer, for example by sending the media data in the format that was used to store it. However, this type of forwarding is not supported by SRTP.
To prevent attacks that may fill the intermediate unit 2 with bogus data, it is advantageous to ensure that the intermediate unit 2 authenticates the sender unit 1 of the media data, and furthermore, verify the authenticity of the media data. This requires that a hob-by-hop (hbh) key/security context is established between the sender unit 1 and the intermediate unit 2.
There is therefore a need for two independent security contexts; one end-to-end (e2e) security context to protect (by encryption or integrity protection) the data between the sender unit 1 and the receiver unit 3, and one hbh security context between the sender unit 1 and the intermediate unit 2 to allow the intermediate unit 2 to authenticate the sender unit 1 and to authenticate the media data. For hbh key establishment, any existing key management scheme like Multimedia Internet Keying (MIKEY) or Datagram Transport Layer Security-SRTP (DTLS-SRTP) could be used. Minor extensions to handle a SRTP (SaF) operation may be required. These key management protocols can also be used for e2e key management in certain situations.
Whilst establishing a key for hbh protection is relatively straightforward, it is not a straightforward matter to establish a key for e2e protection between the sender unit 1 and receiver unit 3 when the media data traverses an intermediate unit 2. If Multimedia Internet Keying (MIKEY) is used (see RFC 3830 for a description of MIKEY) to establish keys for e2e protection, then there are different options for key management as follows:
1. Using MIKEY in pre-shared key mode. In this scenario, an e2e pre-shared key must be available to the sender unit 1 at the time of making the call. This implies that it must be distributed in advance of the data being sent by a secure method, which leads to a key bootstrapping problem. The sender unit 1 must use a special procedure to get hold of the key for the intended receiver unit 3. This procedure could be an online procedure allowing the sender unit 1 to fetch a key when needed, or it could be, for example, a download of keys for all possible receiver units in the sender's phone book.2. Using MIKEY in Rivest Shamir and Adleman (RSA) mode. In this scenario, the receiver unit's 3 credentials must be available in advance at the sender unit 1 side, and the same problems arise as for using MIKEY in pre-shared key mode.3. Using MIKEY in Diffie-Hellman mode. This scenario does not work, as it requires both the sender unit 1 and receiver unit 3 to participate in a message exchange.4. Using MIKEY in Reverse RSA (RSA-R) mode (see RFC 4738). This scenario also does not work as the end-point would have to be the intermediate unit 2, and so a key would be generated that would allow the intermediate unit 2 to access the media data.
One option which could be used together with different key management schemes is to exchange credentials when the sender unit 1 connects directly to the receiver unit 3. The parties could then store the credentials and use them when a call is redirected to an intermediate unit 2 such as a mail box. This would work for MIKEY RSA-R and DTLS-SRTP distribution of certificates. The problem with this option is that at least one direct connection between the sender unit 1 and receiver unit 3 has to have taken place before any data is redirected to an intermediate unit 2. A further problem is that both the sender unit 1 and receiver unit 3 must store the credentials, and the credentials can only be updated when the sender unit 1 and receiver unit 3 connect directly.