The present invention relates to a speech signal coding/decoding system for coding/decoding a digital input speech signal at a low bit rate.
In a system with a restricted frequency bandwidth and/or transmission power, such as a digital maritime satellite communication system or a digital business satellite communication system employing an SCPC (single channel per carrier), the speech coding/decoding system which can achieve a high speech quality at low bit rate and is hardly affected by a transmitted code error is required.
Based on such a background, a variety of speech coding/decoding systems have been already proposed. The typical systems thus proposed include an adaptive predictive coding (APC) system for coding an input signal, on a frame basis, with a predictor for removing a correlation from the input signal in order to obtain a residual signal. An adaptive quantizer quantizes the residual signal (U.S. Pat. No. 4,811,396, and U.S. Ser. No. 265,639). A multi-pulse excited linear predictive coding (MPEC) system excites an LPC synthetic filter by a plurality of pulses as a sound source. A CELP (code excited linear predictive coding) system excites an LPC synthetic filter by a residual signal pattern as the sound source, and the like.
The adaptive predictive coding (APC) system will be described below in detail as the typical example of a conventional speech coding/decoding system.
FIGS. 1(a) and 1(b) show the fundamental structure of a conventional adaptive predictive coding system (U.S. Ser. No. 265,639). In operation, a digital input signal is input to an LPC analyzer 2 and a short term predictor 6 via a coder input terminal 1. A short term spectral analysis (called "LPC analysis" hereinafter) is conducted on every frame by the LPC analyzer 2 based on the digital input signal. An LPC parameter obtained thereby is coded by an LPC parameter coder 3 to be transmitted to a decoder on a receiving side via a multiplexer 30. The output of the LPC parameter coder 3 is decoded by an LPC parameter decoder 4. A short term prediction parameter is obtained from the output of the decoder 4 by an LPC parameter/short term prediction parameter converter 5. The short term prediction parameter is input to a short term predictor 6, a noise shaping filter 19 and a local decoding short term predictor 24.
A correlation between the adjacent samples of a speech waveform is removed by subtracting the output of the short term predictor 6 employing the short term prediction parameter from the digital input signal by a subtracter 11 to obtain a short term prediction residual signal. This signal is input to a pitch analyzer 7 and a long term predictor 10. Pitch analysis is conducted on every frame by the pitch analyzer 7 based on the short term prediction residual signal. A pitch period and a pitch parameter obtained thereby are coded by a pitch parameter coder 8 to be transmitted to the decoder on the receiving side via the multiplexer 30. The pitch period and the pitch parameter are decoded by a pitch parameter decoder 9 to be set to a long term predictor 10, the noise shaping filter 19 and a local decoding long term predictor 23.
The periodicity of the short term predictor signal is removed by subtracting the output of the long term predictor 10 employing the pitch period and the pitch parameter from the short term prediction residual signal by a subtracter 12 to obtain a long term prediction residual signal which is ideally white noise. The output of the noise shaping filter 19 is subtracted from the long term prediction residual signal by a subtracter 17 to obtain a final prediction residual signal. This signal is quantized and coded by an adaptive quantizer 16 to be transmitted to the decoder on the receiving side via the multiplexer 30. The coded final predicted residual signal is decoded and inversely quantized by an inverse quantizer 18 to be input to a subtracter 20 and an adder 21. A quantization noise is obtained by subtracting the final predicted residual signal, an input signal to the adaptive quantizer 16, from the inversely quantized final predicted residual signal. The quantization noise is input to the noise shaping filter 19.
In order to update a step size of the adaptive quantizer for every subframe, an RMS (root mean square) value of the above-described long term predicted residual signal is calculated by an RMS value calculating circuit 13 to be coded as a reference level by an RMS value coder 14. The RMS value coder 14 stores a reference level and adjacent levels. The output signal of the RMS value coder 14 is decoded by an RMS value decoder 15 and a quantized RMS value corresponding to the reference level in particular is made as a reference RMS value. The step size of the adaptive quantizer 16 is determined by multiplying the reference RMS value by a fundamental step size prepared in advance. The output of the local decoding long term predictor 23 is added to a quantized final predicted residual signal, the output signal of the inverse quantizer 18, by the adder 21. An obtained resultant is input to the local decoding long term predictor 23 and added thereto with the output of the local decoding short term predictor 24 by an adder 22 and is input to the local decoding short term predictor 24. A locally decoded digital input signal is thereby obtained by this a procedure. A difference between the locally decoded digital input signal and the original digital input signal is obtained as an error signal by a subtracter 26. The power of the error signal is calculated by a minimum error power detector 27 over the sub-frames. A series of similar operations are performed with respect to other fundamental step sizes prepared in advance and the stored adjacent levels to the reference level. The coded RMS level and the fundamental step size that provide the minimum power in error signal powers thus obtained are selected to be transmitted to the decoder on the receiving side via the multiplexer 30. A step size coder 29 is used for coding the step size.
FIG. 1(b) is a block diagram showing the decoder used in a conventional adaptive predictive coding system.
Codes input via a decoder input terminal 32 are separated into signals relating to a final residual signal, the RMS value, the step size, the LPC parameter, the pitch period and the pitch parameter by a demultiplexer 33 to be and are input to an adaptive inverse quantizer 36, an RMS value decoder 35, a step size decoder 34, an LPC parameter decoder 38 and a pitch parameter decoder 37, respectively.
The RMS value decoded by the RMS value decoder 35 and the fundamental step size obtained by the step size decoder 34 are set to the adaptive inverse quantizer 36. A series of codes relating to the received final predicted residual signal is inversely quantized by the adaptive inverse quantizer 36 to obtain a quantized final predicted residual signal. A short term prediction parameter, decoded by the LPC parameter decoder 38 and obtained by an LPC parameter/short term prediction parameter converter 39, is input to the short term predictor 43, one of the predictors which form the synthetic filter, and to a post noise shaping filter 44. The pitch period and the pitch parameter, which are decoded by the pitch parameter decoder 37 are input to a long term predictor 42, the other predictor that forms the synthetic filter.
The output of the long term predictor 42 is added to the output of the adaptive inverse quantizer 36 by an adder 40. The output thereof is input to the long term predictor 42. The output of the adder 40 is added to the output of the short term predictor 43 by an adder 41 to obtain a reproduced speech signal. This signal is input to the short term predictor 43 and the post noise shaping filter 44 for noise-shaping. The reproduced speech signal is input also to a level adjuster 45 and the level is adjusted by comparing the reproduced speech signal with the output of the post noise shaping filter 44.
Specifically, a gain adjustment coefficient G.sub.0 is obtained by; ##EQU1## and the output of the post noise shaping filter 44 is multiplied by G.sub.0.
The short term predictors 6, 24 and 43 in the coder and the decoder will be described below. The transfer function P.sub.s (z) of the short time predictors 6, 24 and 43 is given by; ##EQU2## where a.sub.i is a short term prediction parameter and N.sub.s represents the number of taps of the short term predictor. The parameter a.sub.i is calculated in the LPC analyzer 2 and the LPC parameter/short term prediction parameter converter 5 for every frame and adaptively changes in response to a change in the spectrum of the input signal for every frame. The transfer function represented by expression (2) is incorporated also into the noise shaping filter 19 in the coder and the post noise shaping 45 in the decoder.
Generally, in order to keep the stability of the speech reproduction in the synthetic filters 24 and 43, a prediction obtained by the LPC analyzer 2 is intentionally reduced by introducing a coefficient, called a leakage. That is, generally the product of the leakage r.sub.s (0&lt;r.sub.s &lt;1) and the short term prediction parameter is used as a filter parameter for the short term predictors or the noise shaping filters. Specifically, the transfer function P.sub.s (z) of the short term predictors 6, 24 and 43 is given by; ##EQU3## where the leakage r.sub.s is fixed and the same value of the leakage r.sub.s is used on both the coder and decoder sides.
The same can be said on the other speech coding/decoding systems. As another example, the CELP system will be briefly described below.
On the transmitting side, firstly a correlation between adjacent samples is calculated from the digital input speech signal by LPC analysis and the short term prediction parameter is input to the synthetic filter. The synthetic filter is excited by a signal output from a vector-quantizer to obtain the reproduced speech signal. That is, the short term predicted signal is formed by the short term predictor and added to the exciting signal to reproduce the digital input speech signal in the synthetic filter. The reproduced speech signal is input to the short term predictor in order to form the short term predicted signal for the next timing. An error signal between the reproduced speech signal and the digital input speech signal is calculated and the exciting signal is so selected in order to minimize the power of the error signal audibly weighted by the weighting filter. Information on the exciting signal and a short term prediction is transmitted to the receiving side.
An exciting signal is formed from the information on the exciting signal by vector-quantizer. On the receiving side, the same as on the transmitting side, the reproduced speech signal is obtained by exciting the synthesis filter with the short term prediction parameter.
The short term predictors generally represented by expression (3) are included in the synthetic filters on the coder side and the decoder side. The leakages are fixed and the same value is used both the coder and decoder sides as described above.
As described above, a leakage as the one in expression (3) is generally used in the short term predictors 6, 24 and 43, the noise shaping filter 19 and the post noise shaping filter 44. The object of the leakage is to stabilize the operation of the short term predictors 24 and 43, the constituents of the synthetic filter. Conventionally, stability has been attained by intentionally reducing the prediction obtained by the LPC analyzer 2. Therefore, the use of small leakage reproduces the speech including a lot of quantization noise especially in the vicinity of a consonant or unvoiced sound. Conversely, the use of large leakage reproduces speech that appears to resonate especially in the vicinity of a vowel (voiced sound).
In the conventional system, however, the constant value leakage has been used irrespective of the nature of the speech. Therefore, the conventional speech coding/decoding system has had a problem that a sufficient decrease in the quantization noise is impossible and a good reproduced speech quality is unable to be obtained in both a voiced sound and an unvoiced sound.