1. Technical Field
The invention is related to receipt and playback of packet-based audio signals, and in particular, to a system and method for providing automatic jitter control and packet loss concealment for audio signals broadcast across a packet-based network or communications channel.
2. Related Art
Conventional packet communication systems, such as the Internet or other broadcast network, are typically lossy. In other words, not every transmitted packet can be guaranteed to be delivered either error free, on time, or even in the correct sequence. Further, any delay in delivery time is usually variable. If the receiver can wait for packets to be retransmitted, correctly ordered, or corrected using some type of error correction scheme, then the fact that such networks are inherently lossy and delay prone is not an issue. However, for near real-time applications, such as, for example, voice-based communications systems across such packet-based networks, the receiver can not wait for packets to be retransmitted, correctly ordered, or corrected without causing undue, and noticeable, lag or delay in the communication.
Many conventional schemes address minor delays in packet delivery time by simply providing a temporary buffer of received packets in combination with a delayed playback of the received packets. Such schemes are typically referred to as “jitter control” schemes. In general, most such schemes address delay in packet receipt by using a “jitter buffer” or the like which temporarily stores incoming packets or signal frames and provides them to a decoder with sufficient delay that one or more subsequent packets should have already been received. In other words, the jitter buffer simply keeps one or more packets in a buffer for delaying playback of the incoming signal for a period long enough to ensure that a majority of packets are actually received before they need to be played.
A sufficient increase in the length of the buffer allows virtually all packets to be received before they need to be played back. In fact, if the size of the jitter buffer is at least as long as the difference between the smallest and largest possible packet delays, then all packets could be played without any apparent gap or delay between packets. Unfortunately, as the length of the buffer increases, playback of the signal increasingly lags real-time. In a one-way audio signal, such as a music broadcast, for example, this is typically not a problem. However, in systems such as real-time or two-way conversations, temporal lag resulting from the use of such buffers becomes increasing apparent, and undesirable, as the buffer length increases.
In addition, the basic idea of using a buffer has been improved in many modern communications systems by using compression and stretching techniques for providing temporal adjustment of the playback duration of signal frames. As a result, the jitter buffer length can be adapted during speech utterances by stretching or compressing the currently playing audio signal, as necessary, for reducing the average delay without incurring as many late losses. Unfortunately, the use of temporal stretching and compression techniques for frames in an audio signal often results in audible artifacts which may be objectionable to the human listener.
An additional conventional technique, commonly referred to as “packet loss concealment” has also been used to improve the perceived speech quality. For example, as noted above, packet loss may occur when overly delayed packets are not received in time for playback. Typically, such overly delayed packets are referred to as “late loss” packets. Similarly, packet loss may also occur simply because the packet was never received. Conventional packet loss concealment schemes typically address such overly delayed and lost packets in the same manner by using some sort of packet loss concealment technique.
Further, many such schemes provide a combination of both jitter control and packet loss concealment. With respect to jitter control, most schemes determine the size of the jitter buffer by determining a minimum buffer size as a compromise between late or actual loss and packet delay. Further, a number of conventional schemes offer some sort of network analysis for further optimizing buffer size for minimizing delay and maximizing timely packet receipt. Packets that are determined to be late loss packets are typically handled in the same way as if they were actually lost. In fact, actually lost packets are typically declared to be a late loss anyway, as whatever delay criteria is used for determining a late loss will also be met by an actually lost packet. In either case, conventional decoders implement some sort of error concealment to hide the fact that the packet that should be played has not been received.
One conventional scheme uses both jitter control and packet loss concealment. In general, this scheme minimizes the length of the jitter buffer by allowing each packet to be stretched and/or compressed, as needed to account for delayed packet receipt while still maintaining one or more packets in the jitter buffer. In particular, this scheme first introduces a one-packet delay, in order to wait for a packet to be either received, or declared lost, before deciding on whether the packet to be played currently should be stretched or compressed. Further, this scheme analyzes network performance on an ongoing basis to determine whether packets scheduled to be played in the near future are likely to be received on time. Received packets are then stretched or compressed, as necessary, to ensure that the buffer is not empty before the next scheduled packet arrival time.
However, when a packet does not arrive by the scheduled time, it is declared to be a late loss, and error concealment is then used to hide that loss. Most modem schemes use some form of stretching and compression in combination with a windowing technique for merging boundaries of packets bordering missing packets declared to be late loss packets. In general, such schemes typically operate by decomposing input packets input into overlapping segments of equal length. These overlapping segments are then realigned and superimposed via a conventional correlation process along with smoothing of the overlap regions to form an output segment having a degree of overlap which results in the desired output length. The result is that the composite segment is useful for hiding or concealing perceived packet delay or loss. Unfortunately, such schemes typically make packet-based decisions regarding whether a packet is to be declared as late loss. Consequently, such schemes often declare packets to be a late loss when they are actually received in sufficient time that they could have been played as a part of the signal playback.
Therefore, what is needed is a system and method that provides for both jitter control and packet loss concealment. This scheme should minimize buffer length, and thus delay, while also minimizing any artifacts resulting from either stretching or compression of audio segments. Further, rather than using a simple packet-based determination for deciding late loss for particular packets, the decision should be made as a function of buffer content for reducing overall buffer size and delay.