Packetized Voice
In one known approach, packetized voice information is transmitted over Internet Protocol (“IP”) networks using the Real Time Protocol (RTP). Each packet comprises one or more headers and a payload of voice information. In one approach, the headers consist of an IP header, User Datagram Protocol (“UDP”) header and RTP header, which occupy 40 bytes of the packet. The payload is typically 10 to 20 bytes, depending on the type of coders/decoders (“codecs”) that are used by the call endpoints. Thus, the headers represent significant overhead compared to the payload size. The large comparative size of the headers introduces inefficiency, and might result in effective utilization that is as low as 20% of the total bandwidth of the network links that carry voice traffic.
FIG. 1 is a block diagram illustrating the structure of an RTP packet. In FIG. 1, RTP packet 100 comprises IP header 102, UDP header 104, RTP header 106 and media payload 108. IP header 102 is 20 bytes long, UDP header 104 is 8 bytes long, RTP header 106 is 12 bytes long and media payload 108 is 10 to 20 bytes long. Thus, a network link that is carrying a significant amount of voice traffic ends up with an effective bandwidth utilization that is roughly 20-30% of the actual capacity of the network link. For example, a Voice Point Of Presence (POP) hosting a farm of Media Gateways, which mostly generates voice traffic, has an effective bandwidth utilization that is roughly 20-30% of the actual capacity of the network link.
When Time Division Multiplexing is used for voice transmission, as in a conventional circuit-switched network such as the public switched telephone network, the network transports voice in uncompressed samples. For example, following recommendation G.711 of the International Telecommunications Union, each sample represents 125 msec of voice. In this approach, end-to-end latency is close to wire-speed.
In contrast, in IP networks, voice is transmitted by sending the media payloads encapsulated in RTP packets of the type shown in FIG. 1. Transporting RTP packets with payloads consisting of small samples of a single Pulse Code Modulation (“PCM”) voice channel, such as uncompressed G.711 samples, can be very inefficient and expensive due to the overhead caused by the packet headers. In order to improve efficiency, voice-over-IP (VoIP) hardware and software can incorporate larger samples of a PCM channel in the payload by applying complex compression algorithms, or codecs.
Examples of relevant codecs that can increase the amount of voice information carried in the payload include G.723.1, G.729, G.729a and AudioCodes' Netcoder. Table A lists some of the codecs along with their typical frame size, packets generated per second (pps), required bandwidth without headers, and payload size.
TABLE AFrameBit ratePayloadCodecsize (ms)pps(Kbps)size (bytes)Netcoder20504.8-9.612-24G.723.130335.3-6.320-24G.72910100810
However, larger samples and complex compression algorithms increase latency. Thus, there is a need for a packetized voice transmission approach in which a large amount of voice information is carried, without adversely affecting latency.
Header Compression—Using Compressed RTP
One method of resolving the overhead problem associated with media traffic over a network link, without increasing latency, is to compress the headers of an RTP packet. Certain parts of the headers are either constant throughout a session or at least through sufficiently long portions of the session. Even if parts of the header are changed, they are changed in some deterministic way.
One approach to header compression is the Compressed RTP protocol (“CRTP”) as defined in RFC 2508. CRTP is a link-by-link compression mechanism for RTP packets running directly over PPP. CRTP was designed explicitly for slow-speed links.
Under the CRTP protocol, compressor and de-compressor devices must maintain a collection of shared information in a consistent state between the compressor and de-compressor. A separate session context is stored for each IP/UDP/RTP packet stream, as defined by a particular combination of the IP source and destination addresses, UDP source and destination ports, and the RTP SSRC field. The number of session contexts to be maintained may be negotiated between the compressor and de-compressor.
Each session context is identified by an 8-bit or 16-bit Context Identifier (CID), depending upon the number of session contexts negotiated. Thus, the maximum number is 65536. Both uncompressed and compressed packets must carry the CID and a 4-bit sequence number used to detect packet loss between the compressor and de-compressor. Each context has its own separate sequence number space so that a single packet loss need only invalidate a single context. Creating software and hardware products compatible with CRTP is difficult and complicated due to the number of specialized formats that are defined.
Further, because CRTP is a link-layer protocol, the header has to be compressed and then decompressed at each and every intermediate router to achieve an end-to-end effect. Accordingly, CRTP is not a scalable solution because the compression and decompression operation is CPU intensive, and has to be done for each and every RTP packet. Also, each and every router along the path is required to support the CRTP protocol.
The compression method used by CRTP is very efficient. However, it assumes no loss at the link layer. The assumption of no loss at the link layer is not acceptable when compressing RTP packets end-to-end because the RTP packets can often be dropped or delayed. A different mechanism that is less sensitive to loss is therefore required.
UDP/RTP Header Compression
An alternative solution for supporting an end-to-end operation is to compress only the UDP and RTP headers while leaving the IP header in place (possibly after some modifications). However, the savings garnered by compressing only the UDP and RTP headers are not as substantial as the savings garnered by using the compression method of CRTP.
Based on the foregoing, there is clear need for an improved method for transmitting media packets in order to effectively use the available bandwidth in an IP and VoIP network.
There is a specific need for such an improved method that does not increase packet latency, and which is an end-to-end solution rather than a link-by-link solution.
There is also a specific need for an improved method that is simpler to implement than the CRTP approach.