The present invention relates to voice coding techniques for encoding voice signals in high quality at low bit rates, especially at 8 to 4.8 kb/s.
As a method for coding voice signals at low bit rates of about 8 to 4.8 kb/s, for example, there is a CELP (Code Excited LPC Coding) method described in the paper titled "Code-excited linear prediction: High quality speech at very low bit rates" (Proc. ICASSP, pp. 937-940, 1985) by M. Schroeder and B. Atal (reference No. 1) and the paper titled "Improved speech quality and efficient vector quantization in SELP" (ICASSP, pp. 155-158, 1988) by Kleijn et al. (reference No. 2).
In the method described in these papers, spectral parameters representing spectral characteristics of voice signals are extracted in the transmission side from voice signals for each frame (20 ms, for example). Then, the frames are divided into subframes (5 ms, for example), and pitch parameters of an adaptive codebook representing long-term correlation (pitch correlation) are extracted so as to minimize a weighted squared error between a signal regenerated based on a past excitation signal for each subframe and the voice signal. Next, the subframe's voice signals are predicted in long-term based on these pitch parameters, and based on residual signals calculated through this long-term prediction, one kind of noise signal is selected so as to minimize weighted squared error between a signal synthesized from signals selected from a codebook consisting of pre-set kinds of noise signals and the voice signal, and an optimal gain is calculated. Then, an index representing a type of the selected noise signal, gain, the spectral parameter and the pitch parameters are transmitted.
In addition, as another method for coding voice signals at low bit rates of about 8 to 4.8 kb/s, the multi-pulse coding method described in the paper titled "A new model of LPC excitation for producing natural-sounding speech at low bit rates" (Proc. ICASSP, pp. 614-617, 1982) by B. Atal et al. (reference No. 3) etc. is known.
In the method of reference No. 3, the residual signal of above-mentioned method is represented by a multi-pulse consisting of a pre-set number of pulse strings of which amplitude and locations are different from others, amplitude and location of the multi-pulse are calculated. Then, amplitude and location of the multi-pulse, the spectral parameter and the pitch parameters are transmitted.
In the prior art described in references No. 1, No. 2 and No. 3, as an error evaluation criterion, a weighted squared error between a supplied voice signal and a regenerated signal from the codebook or the multi-pulse is used when searching a codebook consisting of multi-pulses, adaptive codebook and noise signals.
The following equation shows such a weighted scale criterion. ##EQU1##
Where, W(z) represents transfer characteristics of a weighting filter, and a.sub.i is a linear prediction coefficient calculated from a spectral parameter. .gamma..sub.1.sup.i, .gamma..sub.2.sup.i are constants for controlling a weighting quantity, they are typically set such that 0&lt;.gamma..sub.2 &lt;.gamma..sub.1 &lt;1.
However, there is a problem that speech quality of regenerated voices using code vectors selected with this criterion or calculated multi-pulses do not always fit to natural auditory feeling because this evaluation criterion does not match with natural auditory feeling.
Moreover this problem becomes particularly noticeable the bit rate was reduced and the codebook was reduced in size.
Furthermore, in the above-mentioned prior art, the number of bits of codebook in each subframe is supposed constant when searching a codebook consisting of noise signals. Additionally, the number of multipulses in a frame or a subframe is also constant when calculating a multipulse.
However, power of voice signals remarkably varies as time passes, so it has been difficult to code voices to a high quality by a method using a constant number of bits where the power of voice signals varies as time passes. Especially, this problem becomes serious under the conditions that bit rates are reduced and sizes of codebooks are minimized.