This invention relates to a speech decoder for decoding a speech signal and, in particular, to a speech decoder that can decode a background noise signal with a high quality, the background noise signal being included in a speech signal coded at a low bit rate.
As a method for coding a speech signal at a high efficiency, CELP (Code Excited Linear Predictive Coding) is known in the art, and is described, for example, in M. Schroeder and B. Atal, “Code-excited linear prediction: High quality speech at very low bit rates” (Proc. ICASSP, pp. 937–940, 1985: hereinafter referred to as Document 1), Kleijn et al, “Improved speech quality and efficient vector quantization in CELP” (Proc. ICASSP; pp. 155–158, 1988: hereinafter referred to as Document 2), and so on. Documents 1 and 2 are incorporated herein by reference.
In the conventional method, on a transmission side, spectral parameters representative of spectral characteristics of a speech signal are extracted from the speech signal for each frame (e.g. 20 ms long) by the use of a linear predictive (LPC) analysis. Then, each frame is divided into subframes (e.g. 5 ms long). For each subframe, parameters (a gain parameter and a delay parameter corresponding to a pitch period) are extracted from an adaptive codebook on the basis of a preceding excitation signal. By the use of an adaptive codebook, the speech signal of the subframe is pitch-predicted. For an excitation signal obtained by the pitch prediction, an optimum excitation code vector is selected from an excitation codebook (vector quantization codebook) comprising predetermined kinds of noise signals and an optimum gain is calculated. Thus, an excitation signal is quantized.
The excitation code vector is selected so as to minimize an error power between a signal synthesized by the selected noise signal and the above-mentioned residual signal.
An index representative of the kind of the selected code vector, the gain, the spectral parameters, and the parameters of the adaptive codebook are combined by a multiplexer unit and transmitted.
In addition, as a technique to reduce the amount of calculations required to search the excitation codebook, various methods have been proposed.
For example, an ACELP (Algebraic Code Excited Linear Prediction) method is proposed. This method is described, for example, in C. Laflamme et al, “16 kbps wideband speech coding technique based on algebraic CELP” (Proc. ICASSP, pp. 13–16, 1991: hereinafter referred to as Document 3). Document 3 is incorporated herein by reference.
According to the method described in Document 3, an excitation signal is expressed by a plurality of pulses, and furthermore, each of positions of the pulses is represented by a predetermined number of bits and is transmitted. Herein, the amplitude of each pulse is restricted to +1.0 or −1.0. Therefore, the mount of calculations required to search the pulses can considerably be reduced.
However, according to the above-mentioned conventional methods and techniques, there is a problem that an excellent sound quality is obtained at a bit rate of 8 kb/s or more but, particularly when a background noise is superposed on a speech, the sound quality of a background noise part of a coded speech is deteriorated at a lower bit rate. This problem significantly arises, for example, in the case where the speech coding is carried out in the cellular phone, and so on.
According to the coding approaches described in Document 1 and Document 2, the reduction of the bit rate of the coding results in that the number of the bits included in the excitation codebook decreases, and thereby that the reproduction accuracy of waveforms is deteriorated. The deterioration of the waveform reproduction accuracy does not appear on high waveform-correlation signals such as speech signals, but significantly appears on low waveform-correlation signals such as background noise signals.
In the coding approach described in Document 3, an excitation signal is represented by the combination of pulses. The pulse combination is suitable for modeling a speech signal so that an excellent sound quality is obtained. However, a sound quality of a coded speech is significantly deteriorated at a lower bit rate because the number of pulses for a single subframe is not enough to represent the excitation signal with high accuracy.
The reason is as follows. The excitation signal is expressed by a combination of a plurality of pulses. Therefore, in a vowel period of the speech, the pulses are concentrated around a pitch pulse which gives a starting point of a pitch. In this event, the speech signal can be efficiently represented by a small number of pulses. On the other hand, with respect to a random signal such as the background noise, non-concentrated pulses must be produced. In this event, it is difficult to appropriately represent the background noise with a small number of pulses. Therefore, if the bit rate is lowered and the number of pulses is decreased, the sound quality for the background noise is drastically deteriorated.
In the light of the above-mentioned problems arising in the conventional methods and techniques, it is an object of this invention to remove the above-mentioned problems and to provide an improved speech decoder for decoding a speech signal where a background noise signal is superposed by coding of the above-mentioned methods and techniques. The improved speech decoder requires a relatively small amount of calculation but can decode the speech signal wit suppression of deterioration of the sound quality even if a bit rate is low.