The Real-time Transport Protocol (RTP) is a protocol for delivering audio and video media data over a packet switched network. RTP is used for transporting real-time and streaming media data, such as interactive audio and video. It is therefore used in applications such is IPTV, conferencing, Voice over IP (VoIP).
Secure Real-time Transport Protocol (SRTP), specified in IETF RFC 3711, is a transport security protocol that provides a form of encrypted RTP. In addition to encryption, it provides message authentication and integrity, and replay protection, in unicast, multicast and broadcast applications. SRTP is used to protect content delivered between peers in an RTP session. SRTP is a transport security protocol and it is only intended to protect data during transport between two peers running SRTP. In particular, it does not protect data once it has been delivered to the endpoint of the SRTP session. In addition, the sending peer provides the protection by way of encryption of the media data, in other words it is assumed that the sending peer has knowledge of all keying material and is the one applying the protection of the data.
RTP is closely related to RTCP (RTP control protocol), which can be used to control the RTP session, and similarly SRTP has a sister protocol, called Secure RTCP (or SRTCP). SRTCP provides the same security-related features to RTCP as the ones provided by SRTP to RTP.
Utilization of SRTP or SRTCP is optional to utilization of RTP or RTCP; but even if SRTP/SRTCP are used, all provided features (such as encryption and authentication) are optional and can be separately enabled or disabled. The only exception is the message authentication/integrity feature, which is indispensably required when using SRTCP. Confidentiality protection in SRTP and SRTCP covers the payload, while integrity protection covers the payload and the full packet header.
In some cases it is desirable to introduce an intermediate node between the SRTP sender and the SRTP receiver. Examples of such situations include the use of Store and Forward mailboxes, network answering machines, and caching of IPTV content for more efficient, local access to content. The SRTP intermediate node must be able to resend received SRTP protected packets using an independent RTP session using new session parameters, without having to decrypt and re-encrypt each packet. This causes problems, since some session parameters affect the encryption and integrity protection. Furthermore, the SRTP intermediate node should not be able to access the plaintext media data in each packet, but should be able to check the integrity of each packet in order not to waste storage space on non-authentic content.
It should be noted that transparent end-to-end Packet Switched Streaming Service (PSS), described in TS 26.234, defines an encryption transform which is applied to media data before SRTP is involved in the processing of the data. This corresponds to tunnelling of the PSS protocol inside the SRTP protocol. In this case, an intermediate node in the form of a streaming server is located between the SRTP sender and the SRTP receiver, and the streaming server has no knowledge of the encryption key for the encryption transform. The content is therefore integrity protected using SRTP between the streaming server and the SRTP receiver. The encryption transform specified for PSS encrypts each data unit using 128-bit AES in counter mode for the encryption. Each SRTP packet delivered from the streaming server to the SRTP receiver contains one encrypted data unit. Data is encrypted between the SRTP sender and the SRTP receiver using AES-CTR, and data is integrity protected between the streaming server and the SRTP receiver using SRTP. A session nonce is sent from the SRTP sender to the streaming server, and a further session nonce is sent from the streaming server to the SRTP receiver. Assuming that the SRTP receiver shares the encryption key with the SRTP sender, the SRTP receiver only needs to be able to re-construct the same initialization vector (IV) as was used by the SRTP sender when the data unit was encrypted. The IV for each data unit is constructed from a sequence number which is transported with each data unit combined with a nonce (which remains the same through out the entire session and is signalled out of band before the streaming begins). The construction of the 128-bit IV is defined as:IV=(nonce*2^16)XOR(IVSN*2^16)  (Math 1)
However, a problem with PSS is that it does not provide the same information as SRTP; for example, time stamps and so on, as it is only applied to codec data. Furthermore, PSS can be inefficient in terms of bandwidth, as each payload protected using PSS requires its own header. Similarly, tunnelling data using one protocol within another to transport the data requires different headers for each data layer.