The telecommunications industry in North America and Europe is currently preparing the launch of “3G” (third generation) wireless technologies from both the CDMA and GMS worlds. (CDMA and GMS are wireless communication standards fully familiar to those of ordinary skill in the art.) On the CDMA side, the CDMA1xEvDO (also familiar to those skilled in the art) can provide wireless data connections that are ten times as fast as a regular modem. However, as the name EvDO (Evolution Data Only or Evolution Data Optimized) implies, voice traffic is still routed through 3G1xCS channels. Naturally, the next step is to move voice traffic over IP on wireless high-speed packet channels.
In order to achieve high quality VoIP (Voice over IP) on wireless packet channels, there are many challenges ahead. IP overhead is typically quite large relative to speech payload information. The typical end-to-end delay across a typical communications network needs to be reduced. One way of reducing such end-to-end delay is to minimize the jitter buffer playback delay at the decoder. Unfortunately, one direct effect of minimizing the jitter buffer playback delay is an associated increase of the packet loss rate due to packets that arrive late.
When one or more packets arrive late at the receiving end for playout, a conventional decoder simply discards the late packets, since the decoder has already provided replacement material in accordance with a packet loss concealment (PLC) scheme. (As is well known to those of ordinary skill in the art, PLC schemes are used by most speech decoders in response to lost packets. These schemes use various techniques to attempt to minimize the deleterious effects of missing the speech signal encoded in the lost packet, but most commonly, they use some sort of packet repetition scheme in which the previous packet, possibly modified, is repeated in place of the lost packet.)
In one prior art technique for use with prediction-based speech coders, however, some improvement over conventional decoders has been obtained by utilizing the late packets for purposes of re-synchronizing the decoder, so that the error resulting from the late packet (actually the error resulting from the replacement packet in accordance with the PLR scheme) does not adversely propagate. Such an approach can significantly improve the voice quality over conventional schemes. However, even with use of this re-synchronizing scheme, the late packets are never actually played out, which means that a part of the sound may be missing. This can lead to a potential intelligibility problem. For example, if packets carrying the phoneme “s” from the word “spy” are lost, the resultant speech may end up sounding like “pie” rather than “spy.” A PLC scheme alone, even with re-synchronization of the decoder using late packets, is unlikely to be able to rectify such a problem.