Transmission of voice over packet networks has emerged in recent years as a replacement for traditional legacy PBX systems for telephone communications. A packetized voice transmission system comprises a transmitter and a receiver. The transmitter collects voice samples and groups them into packets for transmission across a network to the receiver. The data itself may be companded according to u-law or A-law, as defined in ITU-T specification G.711. Other companding/vocoding techniques, such as G.729, G.723.1, can also be used.
When using a packet based network, packet losses due to congestion in the network can produce significant degradation of the performance of echo cancellers. The effects introduced by packet loss depend to a large extent on the techniques used to recover lost packets. Packet loss recovery techniques can be divided into two classes: sender-based repair and receiver-based repair [see C. Perkins, O. Hodson and V. Hardman, “A Survey of Packet Loss Recovery Techniques for Streaming Audio,” IEEE Network, Sep./Oct. 1998, pp. 40–48]. Receiver-based repair is also referred to in the art as error concealment.
Among known error concealment techniques, those based on packet insertion have found popularity due to ease of implementation. According to such insertion-based recovery techniques a replacement packet is inserted to fill the gap left by a lost packet. The replacement packet can be one of either silence, white noise or repetition of the previous packet. Silence substitution is simple to implement but performs poorly. Since silence substitution fills the gap left by a lost packet with silence in order to maintain the timing relationship between the surrounding packets, the performance of silence substitution degrades rapidly as packet sizes increases, and quality is unacceptably bad for the 40 ms packet size in common use in network audio conferencing tools. Some studies have shown that inserting white noise, instead of silence, can improve intelligibility [see G. A. Miller and J. C. R. Licklider, “The Intelligibility of Interrupted Speech,” J. Acoust. Soc. Amer., vol. 22, no. 2, 1950, pp. 167–73; and R. M. Warren, Auditory Perception, Pergamon Press, 1982].
Among the three methods of packet insertion, repetition of the previous packet gives best voice quality due to the similarity between the neighboring voice segments.
Although the uses of white noise and previous packets may yield better speech quality than silence substitution does, these techniques interfere with proper operation of network echo cancellers. The substitution of white noise results in a sudden change in the spectral characteristics of the signal, causing severe degradation of echo return loss enhancement (ERLE). When substituting a previous packet, the fill-in packet is the same as the previous packet, which means that the two packets are highly correlated. This reduces the convergence rate and results in slow recovery from the packet loss.