The present invention relates to an efficient encoding/decoding system for speech signals and more specifically to a method of encoding/decoding LSF (line spectral frequency) parameters which are a type of speech parameter and which represent spectral envelope information of an input speech signal.
The spectral envelope of an input speech signal can be represented by LPC (linear predictive coding) coefficients obtained by making an LPC analysis of the input speech signal using autocorrelation coefficients obtained from the input speech signal. For speech encoding, the LPC coefficients are transformed into line spectral frequency (LSF) parameters F(k) (k=1, 2, . . . , N), which are information equivalent to the LPC coefficients. The LSF parameters are also referred to as LSF parameters. The LSF parameters are ones on the frequency axis. When the input speech signal is sampled at 8 KHz by way of example, F(k) are known to take values in the range of 0 to 4,000 Hz.
In a conventional LSF encoder, the code of LSF parameters is selected from an LSF parameter codebook so that the error is minimized while LSF parameters F(k) obtained by subjecting an input speech signal to autocorrelation computation and LSF computation is used as a target and the weighted square error criterion is used as an indicator. The weights, which are computed in the weight computation section and used in the weighted vector quantizer, are set large for LSF parameters the distance between which on the frequency axis is small, and small for LSF parameters the distance between which is large. This is intended to attach importance to frequencies in the neighborhood of the peak of the spectral envelope. The weighted vector quantizer generates quantized LSF parameters and corresponding codes.
The coded LSF parameters are retransformed into LPC coefficients, thereby generating coded LPC coefficients. The coded LPC coefficients are used as parameters of a synthesis filter to represent the spectral envelope characteristic of input speech.
As can be seen from the foregoing, in the conventional technique, the perceptual sensitivity in respect to different perceptual frequencies is not reflected in coding of the LSF parameters. Thus, unless the coding distortion of the LSF parameters is reduced to a sufficiently low level, distortion becomes easy to be perceived at frequencies which is perceptually sensitive, resulting in a degradation in speech quality. For this reason, the conventional technique has a problem that the coding bit rate of the LSF parameters cannot be reduced much.
As another conventional technique, an attempt to reflect the perceptual characteristics of the human ear that is sensitive to low frequencies and relatively insensitive to high frequencies, i.e., the different perceptual sensitivities relative to different perceptual frequencies in coding of the LSF parameters is described in "The MEL LSF VECTOR QUANTIZATION SPEECH CODING METHOD" by SEKI at al, TECHNICAL REPORT OF IEICE, SP 86-14, June, 1986 (literature 1). In this literature, a proposal is made for a method which quantizes the LSF parameters (here synonym for LSF parameters) using the Mel measurement or the log measurement each of which is a type of nonlinear frequency measurement.
However, in the transformation to log measurement proposed in literature 1, the LSF parameters are directly transformed into the form of log10 (F(k)). The present inventors made an attempt to code 10-th-order LSF parameters obtained from a speech signal sampled at 8 kHz with the number of bits of the order of 20 bits. As a result, it has become clear that the distortion of LSF parameters in the low frequency range is unnoticeable, but the distortion of LSF parameters in the high frequency range due to quantization becomes easy to be perceived, and totally the speech quality degrades. Therefore, with mere logarithmic transformation of LSF parameters, it is difficult to reduce the bit rate of the LSF parameters.
As described above, the conventional LSF parameter coding method has problems that, unless the coding distortion of LSF parameters is reduced to a sufficiently low level, the distortion becomes easy to be perceived at frequencies which is perceptually sensitive and the coding bit rate of these parameters cannot be reduced much.