Vocoders are widely used for speech compression in wireless communications systems. In addition, vocoders are used in voice over IP (VoIP) networks and other applications. Using speech analysis and synthesis with linear predictive coding (LPC) and vocal model based quantization techniques, vocoders can significantly reduce the bit rate of a voice channel. A typical low bit rate vocoder, such as ITU-T recommendation G.729, has a bit rate of eight kilobits per second (kbps), which is ⅛ of the 64 kilobits per second rate needed to implement the ITU-T recommendation G.711 codec. The G.711 codec is normally used in the public switched telephone network (PSTN). Though most state-of-the-art vocoders introduce acceptable impairments in perceptual voice quality, the nonlinear processing of speech coding causes such a large change in the speech waveform that it becomes difficult to correlate an input speech waveform to an output speech waveform that has been processed by a vocoder. The waveform of reproduced speech is changed to such a degree that the signal-to-noise ratio almost becomes a useless parameter to measure the difference between a speech waveform before and after speech coding.
Temporal clipping is one kind of impairment that can degrade voice quality of a speech communications system. As used herein, temporal clipping refers to any discontinuity of a speech signal caused by either loss of the signal sent or insertion of a disrupting signal. FIG. 2 shows several graphical plots of signals in the time domain to illustrate common temporal clipping events. A reference signal is shown in plot 200. Plots 202, 204, and 206 show the reference signal corrupted due to front-end, back-end, and center temporal clipping, respectively. Plots 208 and 210 show the reference signal corrupted by skipping and pausing, respectively.
In the case of Internet voice, also known as VoIP, temporal clipping becomes a critical voice quality issue because, without guaranteed quality of service, packet loss, large delay, and jitter are inevitable. For this reason, ITU-T recommendations G.116 and G.117 specify requirements on temporal clipping. In packet networks like the Internet, temporal clipping may result from dropped added, skipped, or silence-suppressed packets.
With a speech transmission system using a conventional codec, such as ITU-T recommendation G.711, it is relatively easy to detect and measure temporal clipping. Commonly, temporal clipping is detected and measured by sending an input signal through a speech transmission system and comparing a delayed version of that input signal with the signal that is output from the speech transmission system, where the delay represents the time to travel through the transmission system. Indeed there are several databases of speech signals commonly used to detect and measure temporal clipping in systems employing conventional codecs. However, due to the acceptable waveform change produced by low bit rate vocoders, it is difficult to detect and measure temporal clipping in speech transmission systems using such vocoders in a similar manner. Also, the silence suppression techniques employed in speech transmission systems employing vocoders make a direct comparison between the input and the output more difficult.
Therefore, a need exists for a method and apparatus to accurately detect and measure quality, including temporal clipping, delay and jitter, in speech transmission systems employing compression.