Speech coders based on source-filter models are known to have quality problems processing generic audio input signals such as music, tones, background noise, and even reverberant speech. Such codecs include Linear Predictive Coding (LPC) processors like Code Excited Linear Prediction (CELP) coders. Speech coders tend to process speech signals low bit rates. Conversely, generic audio coding systems based on auditory models typically don't process speech signals very well to sensitivities to distortion in human speech coupled with bit rate limitations. One solution to this problem has been to provide a classifier to determine, on a frame by frame basis, whether an input signal is more or less speech like, and then to select the appropriate coder, i.e., a speech or generic audio coder, based on the classification. An audio signal processer capable of processing different signal types is sometimes referred to as a hybrid core codec.
An example of a practical system using a speech-generic audio input discriminator is described in EVRC-WB (3GPP2 C.S0014-C). The problem with this approach is, as a practical matter, that it is often difficult to differentiate between speech and generic audio inputs, particularly where the input signal is near the switching threshold. For example, the discrimination of signals having a combination of speech and music or reverberant speech may cause frequent switching between speech and generic audio coders, resulting in a processed signal having inconsistent sound quality.
Another solution to providing good speech and generic audio quality is to utilize an audio transform domain enhancement layer on top of a speech coder output. This method subtracts the speech coder output signal from the input signal, and then transforms the resulting error signal to the frequency domain where it is coded further. This method is used in ITU-T Recommendation G.718. The problem with this solution is that when a generic audio signal is used as input to the speech coder, the output can be distorted, sometimes severely, and a substantial portion of the enhancement layer coding effort goes to reversing the effect of noise produced by signal model mismatch, which leads to limited overall quality for a given bit rate.
The various aspects, features and advantages of the invention will become more fully apparent to those having ordinary skill in the art upon careful consideration of the following Detailed Description thereof with the accompanying drawings described below. The drawings may have been simplified for clarity and are not necessarily drawn to scale.