Base-band or residual coding techniques involve processing the original signal to derive therefrom a low frequency bandwidth signal and a few parameters characterizing the high frequency bandwidth signal components. Said low and high frequency components are then respectively coded separately. At the other end of the process, the original voice signal is obtained by adequately recombining the coded data. The first set of operations is generally referred to as analysis, as opposed to synthesis for the recombining operations.
Obviously any processing involving coding and decoding degrades the voice signal and is said to generate noises. This invention, further described with reference to an example of base-band coding technique, i.e. known as Residual-Excited Linear Prediction Vocoding (RELP), but valid for any base-band coding technique, is made to lower substantially said noise.
RELP analysis generates, in addition to the low frequency bandwidth signal, parameters relating to the high frequency bandwidth energy content and to the original voice signal spectral characteristics.
RELP methods enable reproducing speech signals with communications quality at rates as low as 7.2 Kbps. For example, such a coder has been described in a paper by D. Esteban, C. Galand, J. Menez, and D. Mauduit, at the 1978 ICASSP in Tulsa: `7.2/9.6 kbps Voice Excited Predictive Coder`. However, at this rate, some roughness remains in some synthesized speech segments, due to a non-ideal regeneration of the high-frequency signal. Indeed, this regeneration is implemented by a straight non-linear distortion of the analysis generated base-band signal, which spreads the harmonic structure over the high-frequency band. As a result, only the amplitude spectrum of the high-frequency part of the signal is well regenerated, while the phase spectrum of the reconstructed signal does not match the phase spectrum of the original signal. Although this mismatching is not critical in stationary portions of speech, like sustained vowels, it may produce audible distortions in transient portions of speech, like consonants.