In the areas of telephony, data networking, and telecommunications there has been a shift from analog to digital, wired to wireless, and a continuous migration of some voice calls from conventional time division multiplexing (“TDM”) networks to packet based internet protocol (“IP”) networks. In a typical communication application such as a wireless cellular system and Voice over Internet Protocol (“VoIP”) system, the speech signal might be encoded and decoded several times.
Codec tandeming is a challenging problem in the field of voice quality assurance (“VQA”). The voice quality degradation due to codec tandeming has been a significant problem over the past few decades in the fields of speech coding, speech recognition, and voice enhancement.
If the same coder is involved, it is generally referred to as self-tandeming. If other coders are involved, it is generally referred to as cross-tandeming. In the case of self-tandeming of two G.729 coders as an example, typical voice quality degradation in terms of the Perceptual Evaluation of Speech Quality (“PESQ”) mean opinion score (“MOS”) for a clean speech file is about 0.2 to 0.5 depending on the test speech files.
Due to the voice quality degradation caused by codec tandeming, the effect on codec tandeming over speech recognition accuracy has been widely studied. It is noted that for medium bit rate or low bit rate coders below 13 kbps, the speech recognition accuracy rate is significantly impacted by codec tandeming. For example, for an FS-1016 CELP coder with a 4.8 kbps bit rate the speech recognition word accuracy rate is decreased from 81.86% for one coder to 41.54% for five coders in tandem. The word error rate (“WER”) for clean speech using a GSM Full Rate (“GSM-FR”) coder with a bit rate of 13 kbps is changed from 13.30% for one coder to 23.75% for three coders in tandem.