With the proliferation of the Internet and the continuing development in the multimedia field, most of the voice and video data communications are done today using packet switched networks such as the Internet, which utilize Internet Protocol (IP) standards. IP-enabled devices designed for transferring voice and video data over the Internet have become very popular.
Hot-Swap
An important concern of a communication system such as the Internet is its High Availability (HA). That is, network devices are required be operate, on the average, at least 99.999% of the time, for rendering the communication system reliable. A communication system that fails to meet that requirement is often considered unreliable. HA is ensured by allowing a malfunctioning active (primary) network device, which, until the failing instant rendered a service to a network user(s), to be replaced with a back-up, or redundancy, device for continuing to provide the service previously provided by the failing device. Replacing a malfunctioning device with a back-up, or redundant, device is a feature known in the field as “hot swap”, also referred to as “fail over”, “switch over” and “switchover”.
Switchover generally refers to the transfer of control from an active device to a redundant device. Switchover typically involves sending (directly or indirectly) communication parameters (switchover parameters) associated with an active service-rendering device to a redundant device. A successful switchover means transfer of the control from the active device to the redundant device in a way to obviate the need to break up communication and then renegotiating again with the user to whom service is rendered, while the redundant device resumes control over the communication.
Switching over between a primary VoIP network device to a redundant VoIP network device (which backs up the primary network device) when a non-secure communication is involved is relatively simple because, in this case, switching over involves copying, from a primary network device to a redundant network device serving as a backup device for the primary network device, relatively non-challenging communication parameters. By “non-challenging” is meant static parameters, or parameters that do not substantially change during an entire communication session with a user (a remote network device), though they may change per communication session, where a communication session typically involves traveling of thousands of data packets in both directions. Exemplary communication parameters that are copied from a primary network device to a back-up network device are IP addresses, UDP (User Datagram Protocol) ports, frame size, Encoder and decoder types.
Performing switchover between network devices when secured data is involved is not trivial because of the security features involved. For example, security features include use of sequence numbers and/or timestamps that are uniquely assigned to the outgoing packets, encryption/decryption and authentication schemes, and measures useful for replay protection. Therefore, whenever control is transferred from a primary device to a redundant device, there is an additional requirement (comparing to the requirements for switching over in non-secure communication) to ensure transparent switching over; data relating to security features have to be copied from the primary/active device to the redundant device to ensure successful switchover. As a general rule, the more complex the (security) features involved in a data communication, the less trivial switching over becomes.
The difficulty involved in copying security-related data lies in that security-related data includes dynamic parameters, as opposed to the static parameters used in non-secure switchover(s). By “dynamic parameter” is meant a parameter whose value changes per packet in a given communication session, usually for uniquely characterizing each packet within a packet stream. Dynamic parameters are, for example, the time stamp (TimeStamp), which is the relative time at which the packet is generated, and the Sequence number (SEQ), which is a serial number assigned to a packet, SRTP rollover counter (ROC) and SRTP RTCP index.
IP protocol does not inherently provide security capabilities. Sometimes, data, and audiovisual related data in particular, traversing a data network may include personal and confidential information. Therefore, and since secure transfer of data (over any type of data network) is of concern, it may be desirable to utilize security features, such as authentication and encryption, to protect any personal and confidential information from being intercepted by unauthorized users. One suite of protocols for implementing such security, generally known as IP security (IPsec or IPSEC), has been defined by the Internet Engineering Task Force (IETF). IPSEC is described more fully in Requests For Comments (RFCs) 2401-2412 (by S. Kent et al., November 1998), the content of which is incorporated by reference herein in its entirety. A newer protocol, known as Security Real-Time Transport Protocol (SRTP), has also been defined by the IETF for providing security tailored for RTP protocol, which is the protocol used for media transport over VoIP networks. Both security suits (IPSEC and SRTP) provide protection against replay attacks, as is described more fully, for example in “Using ESP to Prevent Replay Attacks” (Brien M. Posey, MCSE, 2002 Posey Enterprises), “The longest short IP Sec Paper” (Walberts, 10 Jan. 2005), and in “How Secure Is VoIP?” (Ahmar Ghaffar, November 2004).
Replay Attack and Protection
A replay attack is a form of network attack in which a valid data transmission is maliciously or fraudulently repeated or delayed. This is carried out either by the originator or by an adversary who intercepts or copies the data and retransmits it, possibly as part of a masquerade attack. Replay prevention means are, therefore, used to prevent non-authorized network devices from copying and “replaying” packets exchanged by the two network devices. Replay prevention is described more fully in RFCs 2402 and 2406 (by S Kent et al., November 1998), the entire content of which is incorporated herein by reference.
OSI
OSI (Open Systems Interconnection) model is a layered abstract description for communications and computer network protocol design, developed as part of the Open Systems Interconnection initiative It is also called the OSI 7-layer model. Briefly, layer 1 is the Physical layer; layer 2 is the Data Link layer; layer 3 is the Network layer. The best known example of a layer 3 protocol is the Internet Protocol (IP); layer 4 is the Transport layer. The best known example of a layer 4 protocol is the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP); layer 5 is the Session layer; layer 6 is the Presentation layer; and layer 7 is the Application layer.
IPSEC
Since the IP protocol does not inherently provide any security capabilities, IP security (Ipsec or IPSEC) was introduced to provide security services such as: (1) Encrypting traffic (so it can not be read in its transmission), (2) Integrity validation (hence ensuring traffic has not been modified along its path), (3) Authenticating the Peers (hence both ends are sure they are communicating with a trusted entity the traffic is intended for), and (4) Anti-Replay (hence protect against session replay).
Generally, IPSEC is a standard for securing Internet Protocol (IP) communications by encrypting and/or authenticating all IP packets. IPSEC provides security at the network layer (layer 3 of the OSI model), which makes IPSEC relatively flexible, as it call be used for protecting both TCP and UDP based protocols, but increases its complexity and processing overhead because IPSEC cannot rely on TCP (layer 4 of the OSI model) to manage reliability and fragmentation. IPSEC is a set of cryptographic protocols used for securing packet flows and communication key(s) exchange. There are two cryptographic protocols used for securing packet flows: (1) Encapsulating Security Payload (ESP), which provides authentication, data confidentiality and message integrity, and (2) Authentication Header (AH), which provides authentication and message integrity, but does not offer confidentiality. Originally, AH was only used for integrity and ESP was used only for encryption; authentication functionality was added subsequently to ESP. Currently, only one key exchange protocol is defined, the IKE (Internet Key Exchange) protocol.
In IPSEC, a Security Association (SA) describes a unidirectional secured flow of data between two network devices such as gateways. SAs are usually automatically established on demand using the IKE, but some implementations of IPSEC permits manual establishment of SAs. A SA is defined by a destination address, a Security Parameter Index (SPI) and a security protocol. SPI identifies the security parameters in combination with IP address. IPSEC protocols are defined by RFCs 2401-2412. In order to establish a secure communication between two network devices using IPSEC, which is used in the network layer (layer 3) of the OSI model, a security association (SA) may be negotiated and set up between the two network devices involved. SA typically includes use of information such as key lifetime, encryption algorithm, authentication algorithm, and so on SAs are more fully described in RFC 2409, the content of which is incorporated herein by reference. In addition to establishing an SA, the two network devices may enable replay prevention to enhance security.
RTP
RTP (Real-time Transport Protocol) protocols operate at the Session layer (layer 5 of the OSI model). RTP is defined in RFC 3550 (which obsoletes RFC 1889). RFC 3551 (which obsoletes RFC 1890) defines a specific profile for Audio and Video Conferences with Minimal Control. RFC 3711, the entire content of which is incorporated herein by reference, defines the Secure Real-time Transport Protocol (SRTP) profile (actually an extension to RTP Profile for Audio and Video Conferences) which can be used (optionally) to provide confidentiality, message authentication, and replay protection for audio and video streams being delivered.
According to RFC 3550, the entire content of which is incorporated herein by reference, the services provided by RTP include: (1) Payload-type identification; (2) Sequence numbering, which is a monotonically increasing number assigned by a transmitter to a packet before it transmission; (3) Time stamping, which refers to the assignment of generation time to each transmitted packet; and (5) Delivery monitoring (RTCP).
Since RTP is designed to deal with voice and video packets and, therefore, timing aspects of packets are of concern (to allow packets' reordering and to avoid unacceptable delays and jitter), protocols associated with RTP deliver the necessary data to the application to make sure it can put the received packets in the correct order. Also, RTCP provides information about reception quality (through intermittent transmission of RTCP packets), which the application can use to make local (temporal and other) adjustments. For example, if congestion is forming, the application can decide to lower the data rate
SRTP
SRTP defines a profile of RTP, which is intended to provide encryption, message authentication and integrity, and replay protection to the RTP data in both unicast and multicast applications. SRTP facilitates high throughput, and it also seems to provide a suitable protection in a heterogeneous environment that is consisted of both wired and wireless communication network elements. Generally, SRTP intercepts RTP packets and then forwards, per intercepted RTP packet, an equivalent, or associated, SRTP packet on the sending (transmitting) side. SRTP also intercepts SRTP packets and forwards an equivalent RTP packet up the stack in the receiving side. The relationships between secure RTCP (SRTCP) and RTCP are similar to the relationships between SRTP and RTP; namely, SRTCP provides similar security services to RTCP. Using SRTCP message authentication is mandatory, for example for protecting the RTCP fields to keep track of the stream membership, providing feedback data to RTP senders, and for maintaining the packet sequence counters.
Security protocols such as SRTP typically include a feature known as replay protection. According to RFC 3711, a packet is said to be “replayed”, or a “replay attack” occurs when the packet is stored by an adversary, and then re-injected by the adversary into the data network from which it was intercepted. RFC 3711, the SRTP protocol utilizes a replay list as means for protecting a receiving device from replay attacks. The receiving device has a replay list that contains indices of all of the packets it receives, and uses a “sliding window” for distinguishing between legitimate and non-legitimate (replayed) packets. The receiving device recognizes replayed packets by comparing the index of an incoming packet against the indices stored in the replay list. If the index of the received packet resides within the sliding window but the packet is received for the first time, then the packet is considered legitimate. This implies that whenever a switchover occurs, the index of the next received packet should be greater than the last one used/known.
PacketCable™
PacketCable™ is an organization started by Cable Television Laboratories, Inc. CableLabs®), and it is aimed at identifying, qualifying and supporting packet-based voice and video products over cable systems. CableLabs leads this initiative for rendering interface specifications interoperable in order to deliver real-time multimedia services over two-way cable networks. Built on top of the industry's DOCSIS™1.1 (Data Over Cable Service Interface Specifications) cable modem infrastructure, PacketCable networks use Internet Protocol (IP) to enable a wide range of multimedia services, such as IP telephony, multimedia conferencing, interactive gaming, and general multimedia applications. A DOCSIS 1.1 network with PacketCable extensions enables cable operators to deliver data and voice traffic efficiently using a single high-speed, quality-of-service (QoS)-enabled broadband (cable) architecture.
PacketCable interconnects three networks: Hybrid Fiber Coaxial (HFC) Access Network; Public Switched Telephone Network (PSTN); and TCP/IP Managed IP Networks. PacketCable Protocols are: DOCSIS, which is a standard for data over cable; Real-time Transport Protocol (RTP) and Real Time Control Protocol (RTCP), which are required for media transfer; PSTN Gateway Call Signaling Protocol Specification (TGCP), which is an MGCP (Media Gateway Control Protocol) extension for Media Gateways; and Network-Based Call Signaling Protocol Specification (NCS) which is an MGCP extension for analog residential Media Gateways—the NCS specification, which is derived from the IETF MGCP RFC 2705, details VoIP signaling.
Instead of using ROCs value as in SRTP (ROC being associated with the number of packets communicated), PacketCable uses a parameter called Nwrap, which is a counter that counts number of times the RTP timestamp wraps around. The transmitting and the receiving devices have to maintain a count (Nwrap) of RTP timestamp wraps around within the range 0 to 216−1, which means that every time the value of an RTP packet's timestamp wraps around, the value of Nwrap is incremented by one. The value of Nwrap is initialized to zero at the time the communication connection is established. Nwrap has to be incremented also (in synchronization) at the receiver before the receiver can correctly decrypt RTP packets which are received past a wraparound point. The meaning of Nwrap will be explained by using the following example.