This invention relates to a speech encoder operable with a short processing delay and, in particular, to a speech encoder for encoding a speech or voice signal with a high quality at a short frame period or length of 5 ms to 10 ms or shorter.
A conventional speech encoding system is disclosed, for example, in a paper contributed by K. Ozawa et al to the IEICE Trans. Commun. Vol. E77-B, No. 9 (September 1994), pages 1114-1121, under the title of "M-LCELP Speech Coding at 4 kb/s with Multi-Mode and Multi-Codebook" (Reference 1).
According to the above-referenced conventional system, a speech signal is encoded in a transmitting side as follows. By the use of linear predictive coding (LPC), spectral parameters representative of spectral characteristics are extracted from the speech signal at every frame having a frame length of, for example, 40 ms. Calculation is made of feature quantities for signal frames or weighted signal frames obtained by perceptually weighting the signal frames. The feature quantities are used in deciding modes (for example, vowel and consonant segments) to produce mode decision results. With reference to the mode decision results, algorithm or codebooks are switched.
In an encoding part, each frame is subdivided into speech subframes having a subframe length of, for example, 8 ms long. Adaptive parameters (delay parameters corresponding to pitch periods and gain parameters) are extracted from an adaptive codebook for each speech subframe with reference to a previous excitation signal. By the use of the adaptive codebook, pitch prediction is carried out for the speech subframes. For a residual signal obtained by the pitch prediction, an optimal excitation code vector is selected from an excitation codebook (vector quantization codebook) composed of noise signals of a predetermined kind. Excitation signals are quantized by calculating an optimal gain.
The excitation code vector is selected so as to minimize an error power between the residual signal and a signal composed of a selected noise signal. A multi-plexer is used to produce a transmission signal composed of a combination of indexes indicative of the kind of the excitation code vector thus selected, gains, the spectral parameters, and the adaptive parameters of the adaptive codebook.
However, the conventional speech encoding system is disadvantageous in that a sufficient speech quality can not be obtained because of a restricted codebook size.