(a) Field of the Invention
This invention relates to a method of speech vocoder decoding, and more particularly to a method of gain estimation scheme for the vocoder coding.
(b) Description of the Prior Art
The linear predictive coding (LPC) vocoder technique has been widely used for speech coding synthesizer applications (see for example, U.S. Pat. No. 4,910,781 to Ketchum et al. and U.S. Pat. No. 4,697,261 to Wang et al., the entire disclosures of which are herein incorporated by reference). Up to now, LPC-10 vocoders are widely employed for the low bit rate speech compression.
FIG. 1 shows a block diagram of the conventional LPC vocoder. The vocoder generally includes an impulse train generator 11, a random noise generator 12, a voiced/unvoiced switch 13, a gain unit 14, a LPC filter 15, and a LPC parameter setting unit 16.
The input signal of the vocoder is generated from either the impulse train generator 11 or the random noise generator 12. The impulse train generator 11 is capable of generating a periodic impulse train speech signal which is so-called voiced signal. On the other hand, the random noise generator 12 is capable of generating a white noise signal which is so-called unvoiced signal. Either the periodic impulse train signal generated by the impulse train generator 11 or the white noise signal generated by the random noise generator 12 is transmitted into the gain unit 14, according to the proper judgment of the voiced/unvoiced switch 13, and then excites a LPC all-pole filter 15 to produce an output S(n) which is scaled to match the level of the input speech.
The voicing decision, pitch period, filter coefficients, and gain are updated for every speech frame to track changes in the input speech. The overall gain of the synthetic speech needs to be set to match the level of the input speech in practical vocoder applications. Currently, there are two widely used methods of determining the gain. First, the gain can be determined by matching the energy in the speech signal with the energy of the linear predicted samples. This indeed is true when appropriate assumptions are made about the excitation signal to the LPC system. Some assumptions are that the predictive coefficients a.sub.k in the actual model is equal to the predictive coefficients .alpha..sub.k in the real model, the energy in the excitation signal Gu(n) for the actual model is equal to the energy in the error signal e(n) for the real model, u(n)=.delta.(n) for the voiced speech, and u(n) for the unvoiced speech is a zero mean, unity variance, white noise process. With these assumptions, the gain G, can be estimated by: ##EQU1## where R(.) is the auto-correlation of the speech signal, .alpha..sub.k is the LPC coefficients, and p is the predictor order.
Another method for gain computation is based on the root-mean-square (RMS) of samplings over the entire frame N of input speech which is defined as: ##EQU2## For unvoiced frames, the gain is simply estimated by the RMS. For voiced frames, the same RMS-based approach is used but the gain is more accurately estimated using a rectangular window which is a plural number of the current pitch period. The gain computed from either one of the above mentioned two methods is then uniformly quantized on a logarithmic scale using 7 bits.
Because the traditional LPC vocoder is an open loop system, a simple gain estimation scheme is not sufficient to accurately determine the amplitude of synthetic speech. Therefore, the present invention discloses a gain estimation scheme based on the outline of speech waveform, which is called the envelope shape, to eliminate the above described drawbacks.