The present invention relates to a speech encoding method and apparatus for encoding speech at a low bit rate.
A speech encoding technique of compression-encoding a speech signal having a telephone band at a low bit rate is indispensable to mobile communication such as a handy-phone in which the usable radio band is limited, and a storage medium such as a voice mail in which the memory must be efficiently used. At present, there is a strong demand for a scheme which realizes a low bit rate and a small encoding delay. As a scheme of encoding a speech signal having the telephone band at a low bit rate of about 4 kbps, a CELP (Code Excited Linear Prediction) scheme is the effective one. This scheme is roughly divided into a process of obtaining the characteristics of a speech synthesis filter prepared by modeling a vocal tract from an input speech signal divided in units of frames, and a process of obtaining a drive signal corresponding to the input signal of the speech synthesis filter.
Of these processes, the latter process of obtaining the drive signal is performed by calculating the distortion of a synthesized speech signal generated by passing a plurality of drive vectors stored in a drive vector codebook through the synthesis filter one by one, i.e., the error signal of the synthesized speech signal with respect to the input speech signal, and searching for a drive vector that minimizes the error signal. This process is called closed-loop search, which is a very effective method for realizing good sound quality at a bit rate of about 8 kbps.
The CELP scheme is described in detail in M. R. Schroeder and B. S. Atal, "Code Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates", Proc. ICASSP, pp. 937-940, 1985, and W. S. Kleijin, D. J. Krasinski et al. "Improved Speech Quality and Efficient Vector Quantization in SELP", Proc. ICASSP, pp. 155-158, 1988.
On the other hand, I. A. Gerson and M. A. Jasiuk: Techniques for improving the performance of CELP type speech coders, IEEE Proc. ICASSP91, pp. 205-208 discloses the arrangement of an improved perceptual weighting filter including a pitch weighting filter.
In this CELP scheme, a drive vector that minimizes distortion arising from undergone perceptual weighting is searched in a closed loop. According to this scheme, good sound quality can be obtained at a bit rate of about 8 kbps. In the CELP scheme, however, the speech signal buffering size necessary in encoding an input speech signal is large, and the processing delay in encoding, i.e., the time required for actually encoding the input speech signal and outputting an encoding parameter is long. More specifically, in the conventional CELP scheme, the input speech signal is divided into frames each having a length of 20 ms to 40 ms, and buffered. An LPC analysis is performed in units of frames, and an LPC coefficient obtained upon this analysis is transmitted. Due to the buffering and the encoding calculation, a processing delay at least twice the frame length, i.e., a delay of 40 ms to 80 ms is generated.
If the delay between transmission and reception increases in a communication system such as a handy-phone, a channel echo, an audio echo, and the like are generated to interrupt telephone conversations. For this reason, a speech encoding scheme which attains a small processing delay is demanded. To decrease the processing delay in speech encoding, the frame length is decreased. However, the decrease in frame length results in a high transmission frequency of LPC coefficients, so the number of quantization bits for the LPC coefficients and drive vectors must be reduced and this degrades the sound quality of the reconstruction speech signal obtained on the decoding side.
To solve the above-described problems of the conventional CELP scheme, a speech encoding scheme which does not transmit any LPC coefficient can be employed. More specifically, a code vector extracted from, e.g., a codebook is used to generate a reconstruction speech signal vector without passing it through a synthesis filter. Using an input speech signal as a target vector, an error vector representing the error of a reconstruction speech signal vector with respect to the target vector is generated. The codebook is searched for a code vector that minimizes the vector obtained by passing the error vector through a perceptual weighting filter. The transfer function of the perceptual weighting filter is set in accordance with an LPC coefficient obtained for the input speech signal.
When no LPC coefficient is transmitted from the encoding side in this manner, how to control the transfer function of a post-filter arranged on the decoding side is important. That is, in the CELP scheme, since good sound quality cannot be obtained in encoding at a bit rate of 4 kbps or less, a post-filter for improving the-subjective quality by spectrum emphasis (formant emphasis) mainly for a reconstruction speech signal must be arranged on the decoding side. In spectrum emphasis, the transfer function of this post-filter is controlled by the LPC coefficient normally supplied from the encoding side. However, when no LPC coefficient is transmitted from the encoding side, as in the above case, the transfer function cannot be controlled.
In the conventional CELP scheme, the LPC coefficient is quantized to attain a least quantization error, in other words, in a closed loop. For this reason, even if the quantization error of the LPC coefficient is minimized, the distortion of the reconstruction speech signal is not always minimized, and decrease in bit rate degrades the quality of the reconstruction speech signal.
As described above, in the speech encoding apparatus of the conventional CELP scheme, a low bit rate and a small delay leads to degradation of the sound quality of the reconstruction speech. If no parameter representing the spectrum envelope of an input speech signal such as an LPC coefficient is transmitted without using any synthesis filter in order to attain a low bit rate and a small delay, the transfer function of the post-filter necessary on the decoding side for a low bit rate cannot be controlled and the sound quality obtained by the post-filter cannot be improved.