Real-time transmissions over packet switched networks, such as speech, audio, or video over Internet Protocol based networks (mainly the Internet or Intranet networks), has become increasingly attractive due to a number of features. These features include such things as relatively low operating costs, easy integration of new services, and one network for both non-real-time and real-time data. Real-time data, typically a speech, an audio, or a video signal, in packet switched systems is converted into a digital signal, i.e. into a bitstream, which is divided in portions of suitable size in order to be transmitted in data packets over the packet switched network from a transmitter end to a receiver end.
As packet switched networks originally were designed for transmission of non-real-time data, transmissions of real-time data over such networks causes some problems. Data packets can be lost during transmission, as they can be deliberately discarded by the network due to congestion problems or transmission errors. In non-real-time applications this is not a problem since a lost packet can be retransmitted. However, retransmission is not a possible solution for real-time applications that are delay sensitive. A packet that arrives too late to a real-time application cannot be used to reconstruct the corresponding signal since this signal already has been, or should have been, delivered to the receiving end, e.g. for playback by a speaker or for visualization on a display screen. Therefore, a packet that arrives too late is equivalent to a lost packet.
When transferring a real-time signal as packets, the main problem with lost or delayed data packets is the introduction of distortion in the reconstructed signal. The distortion results from the fact that signal segments conveyed by lost or delayed data packets cannot be reconstructed.
When transferring a signal it is most often desired to use as little bandwidth as possible. As is well known, many signals have patterns containing redundancies. Appropriate coding methods can avoid the transmission of the redundant information thereby enabling a more bandwidth effective transmission of the signal. Typical coding methods taking advantage of such redundancies are predictive coding methods. A predictive coding method encodes a signal pattern based on dependencies between the pattern representations. It encodes the signal for transmission with a fixed bit rate and with a tradeoff between the signal quality and the transmitted bit rate. Examples of predictive coding methods used for speech are Linear Predictive Coding (LPC) and Code Excited Linear Prediction (CELP), which both coding methods are well known to a person skilled in the art.
In a predictive coding scheme a coder state is dependent on previously encoded parts of the signal. When using predictive coding in combination with packetization of the encoded signal, a lost packet will lead to error propagation since information on which the predictive coder state at the receiving end is dependent upon will be lost together with the lost packet. This means that decoding of a subsequent packet will start with an incorrect coder state. Thus, the error due to the lost packet will propagate during decoding and reconstruction of the signal.
One way to solve this problem of error propagation is to reset the coder state at the beginning of the encoded signal part included by a packet. However, such a reset of the coder state will lead to a degradation of the quality of the reconstructed signal. Another way of reducing the effect of a lost packet is to use different schemes for including redundancy information when encoding the signal. In this way the coder state after a lost packet can be approximated. However, not only does such a scheme require more bandwidth for transferring the encoded signal, it furthermore only reduces the effect of the lost packet. Since the effect of a lost packet will not be completely eliminated, error propagation will still be present and result in a perceptually lower quality of the reconstructed signal.
Another problem with state of the art predictive coders is the encoding, and following reconstruction, of sudden signal transitions from a relatively very low to a much higher signal level, e.g. during a voicing onset of a speech signal. When coding such transitions it is difficult to make the coder states reflect the sudden transition, and more important, the beginning of the voiced period following the transition. This in turn will lead to a degraded quality of the reconstructed signal at a decoding end.