Wireless and voice-over-internet protocol (VoIP) communications are subject to frequent degradation of packets as a result of adverse connection conditions. The degraded packets may be lost or corrupted (comprise an unacceptably high error rate). Such degraded packets result in clicks and pops or other artefacts being present in the output voice signal at the receiving end of the connection. This degrades the perceived speech quality at the receiving end and may render the speech unrecognizable if the packet degradation rate is sufficiently high.
Broadly speaking, two approaches are taken to combat the problem of degraded packets. The first approach is the use of transmitter-based recovery techniques. Such techniques include retransmission of degraded packets, interleaving the contents of several packets to disperse the effect of packet degradation, and addition of error correction coding bits to the transmitted packets such that degraded packets can be reconstructed at the receiver. In order to limit the increased bandwidth requirements and delays inherent in these techniques, they are often employed such that degraded packets can be recovered if the packet degradation rate is low, but not all degraded packets can be recovered if the packet degradation rate is high. Additionally, some transmitters may not have the capacity to implement transmitter-based recovery techniques.
The second approach taken to combating the problem of degraded packets is the use of receiver-based concealment techniques. Such techniques are generally used in addition to transmitter-based recovery techniques to conceal any remaining degradation left after the transmitter-based recovery techniques have been employed. Additionally, they may be used in isolation if the transmitter is incapable of implementing transmitter-based recovery techniques. Low complexity receiver-based concealment techniques such as filling in a degraded packet with silence, noise, or a repetition of the previous packet are used, but result in a poor quality output voice signal. Regeneration based schemes such as model-based recovery (in which speech on either side of the degraded packet is modeled to generate speech for the degraded packet) produce a very high quality output voice signal but are highly complex, consume high levels of power and are expensive to implement. In practical situations interpolation-based techniques are preferred. These techniques generate a replacement packet by interpolating parameters from the packets on one or both sides of the degraded packet. These techniques are relatively simple to implement and produce an output voice signal of reasonably high quality.
Pitch based waveform substitution is a preferred interpolation-based packet degradation recovery technique. Voice signals appear to be composed of a repeating segment when viewed over short time intervals. This segment repeats periodically with a time period referred to as a pitch period. In pitch based waveform substitution, the pitch period of the voiced packets on one or both sides of the degraded packet is estimated. A waveform of the estimated pitch period is then repeated and used as a substitute for the degraded packet. This technique is effective because the pitch period of the degraded voice packet will normally be substantially the same as the pitch period of the voice packets on either side of the degraded packet.
Waveform substitution can be a very effective packet degradation concealment method for simple coding schemes that do not require use of a memory in order to decode a data stream, for example pulse code modulation (PCM). However, waveform substitution as it is described above is unable to fully address packet degradation problems in some codecs that rely on properties of the decoder in addition to the received data stream in order to decode the data stream. In particular, it is unable to fully address packet degradation problems in codecs that use an internal state held by the decoder after it has decoded a packet of data in order to decode the next packet of data, in addition to using the encoded data in the next packet of data. Examples of such codecs are continuously variable slope delta modulation (CVSD) and adaptive delta pulse code modulation (ADPCM).
If the decoder is used to decode a degraded packet that has been encoded using such a codec, then the decoder generates an erroneous output that does not correspond to the packet prior to its being encoded at the transmitting end of the connection. Additionally, the decoder is left holding an internal state that is dependent on the degraded packet. This internal state is not the correct state for decoding the next packet of data. Consequently the next packet, even if received in an adequate condition, is incorrectly decoded by the decoder. If a packet concealment method is used to generate a decoded output for the degraded packet then the decoded output is not erroneous. However if a packet concealment method is used then the decoder may not need be used. If the decoder is not used then the internal state of the decoder is not updated to the state required to decode the next packet of data. Consequently the next packet, even if received in an adequate condition, is incorrectly decoded by the decoder. The error in the decoder state propagates through subsequent decoding steps. Subsequent packets are therefore additionally incorrectly decoded as a result of the propagation of the error in the decoder state.
If the decoder holds incorrect internal states when it decodes data packets, undesirable artefacts result in the output voice signal. These artefacts cannot easily be removed by conventional waveform domain packet loss concealment algorithms. Updating the decoder state to the correct decoder state for the next data packet to be decoded is therefore important for providing an acceptable quality output voice signal.
Several approaches have been taken to solve the problem of updating the internal state of the decoder when a degraded packet has been received.
U.S. Pat. No. 7,206,986 discloses a packet concealment method that inherently updates the state of the decoder. The apparatus of this patent is depicted in FIG. 1. Received encoded data on line 101 is checked for errors at block 102. If an error is indicated then the switch 103 connects input 104 to output 105. The switch output 105 is connected to CVSD decoder 106. The switch output 105 is also connected to buffer 107. The buffer 107 stores encoded data that is output by the switch to the decoder 106. If an error is detected by block 102 then the pitch period of the data decoded prior to the error is estimated at block 108. The encoded data in buffer 107 is looped to the switch input 104 with a delay that is set in dependence on the pitch period estimated by block 108. The switch 103 feeds the buffered data to the decoder 106 as a substitute for the corrupted packet comprising the error. The decoder decodes the buffered data and outputs a signal which is used as the decoded output for the corrupted packet. The decoder 106 is left holding an internal state suitable for decoding the next packet of encoded data. This method uses the signal directly output from the decoder as the decoded output for the degraded packet. A problem with this is that the direct output from the decoder often contains undesirable artefacts.
There is thus a need for an improved method of decoding a data stream comprising degraded packets that efficiently updates the state of the decoder when processing a degraded packet without reducing the quality of the decoded output for the degraded packet.