1. Technical Field
The present invention relates generally to source and channel coding for speech codecs and more particularly, to a method and system for providing an optimal choice of source and channel code bit rates given information on the packet loss and available bandwidth of a IP access network.
2. Related Art
Existing speech coders were not designed for use over IP packet networks. A packet switched network is a shared medium designed for asynchronous transmission on a best effort basis. In IP networks the available bandwidth and delay vary over time. Time-critical applications such as voice and video have traditionally assumed guaranteed bandwidth, delay and synchronous transmission. Most of the speech coders operate under preset schemes for data and channel code rates making them vulnerable to the varying conditions on wired and wireless IP-based hops. Some kind of adaptation is therefore needed to dynamically adapt the codec bit rate to quickly changing network conditions to preserve acceptable levels of reliability and quality.
Several types of degradations occur in IP networks, among them: 1) packet loss due to network congestion resulting from a lack of available bandwidth, 2) packet loss due to the network jitter, 3) delay due to packetization, 4) delay due to congestion, and 5) packet loss due to random or bursty communication noise. The first four degradations are prominent in the wired IP networks while the last one occurs primarily in wireless networks and is due to residual bit errors at the link layer. The retransmission mechanism used in the TCP protocol for error control cannot be used due to its inherent delay that might be unacceptable for real-time, interactive voice applications.
Codec rate adaptation is an effective method to cope with network congestion and mitigate the effects of packet loss. The European Telecommunication Standards Institute/3rd Generation Partnership Project (“ETSI/3GPP”) adaptive multi-rate (AMR) speech codec is suitable for this purpose. In some systems, the packet loss is reduced by dividing the network conditions into eight states and assigning each state to one of the eight bit rates of the AMR codec. Network conditions are monitored using the difference in timestamps between successive speech frames at the receiving side (i.e., the system monitors the network jitter). The results show a drastic reduction in packet loss rate when compared to fixed-rate codecs such as G.711 and G.723.1.
In some situations, reducing the source bit rate alone does not help. Such situations may be short-term transient congestion, congestion caused by others' traffic or residual bit errors caused by a noisy wireless link. Channel coding—or forward error correction (FEC)—can then be used in conjunction with an error control scheme to optimally allocate the amount of redundant bits and information bits in response to varying channel conditions (available bandwidth, loss rate, delay, etc.)
Others have proposed a flexible scheme for voice transmission over the mobile Internet in which the AMR codec is combined with a systematic convolutional code of rate 1/n. The packetization scheme is done according to the optimal puncturing patterns of the code, that is, all bits stemming from the same generator polynomial are put into the same packet. Hence, one media packet is followed by n−1 forward error correction (FEC) packets. Packet loss is assumed as puncturing of the convolutional code for which decoding methods are known. The code reduces both random packet loss in the wired IP network and random bit errors on the wireless link (similar to a physical layer FEC code). Although the authors mention that appropriate feedback can adaptively control the amount of source and channel bits, they do not implement any rate adaptation scheme.
Another solution proposes using redundant audio coding to deliver decent audio quality to a destination. In this scheme, an audio packet includes the encoded main information as well as a highly compressed version of previous packets (the redundant information). For instance, packet n includes in addition to the PCM encoded samples LPC or GSM versions of packets n−1 and n−2. The perceived loss rate is used as the metric and gradually changes the amount of redundancy according to a fixed threshold (e.g., 3% packet loss rate). Loss rates are sent back to the encoders using RTCP reports. However, this algorithm suffers from several shortcomings, such as a cyclical behavior (increasing and decreasing the redundancy continuously although the network loss rate is near constant) that results in poor performance.
In a modification of the above-referenced approach, the problem is stated as a constrained optimization one: given the available bandwidth, what is the combination of main and redundant information which provides the best perceived audio quality? This approach still uses redundant audio coding, but also tries to optimize a subjective measure of quality taking into account the rate and delay constraints. In this approach, it is found that the main information should be encoded using the highest quality coder and copies located towards the end should be encoded with the next highest quality coders. The available bandwidth and loss rate are obtained using TCP-friendly rate control and RTCP feedback, respectively.
The need to distinguish network congestion from bit errors on radio links serves as a basis for a new quality of service (QoS) control architecture, in an article entitled “Rate and robustness control with RTP monitoring agent for mobile multimedia streaming,”. In this paper, a new type of proxy is introduced called “RTP monitoring agent” located at the edge of the wired network and wireless link. The RTP monitoring agent sends feedback reports about the wired network conditions such as jitter, loss, etc. to the media server. The latter also receives RTCP reports from the media receiver containing statistics about both the wired and the wireless networks. The media server is then able to apply the appropriate strategy depending on whether packet losses are due to network congestion (reduce the encoder bit rate) or radio link errors (increase robustness by adding more FEC). The adaptation algorithm consists of pre-defined combinations targeted to make the total packet loss rate less than 1%.