The present invention relates to a speech signal coding/decoding system, in particular, relates to such a system which codes or decodes a digital speech signal with a low bit rate.
A communication system with severe limitation in the frequency band and/or transmit power, such as a digital marine satellite communication and digital business satellite communication using SCPC (single channel per carrier) is desired to have a speech coding/decoding system with a low bit rate, excellent speech quality, and low error rate.
There are a number of conventional coding/decoding systems adaptive prediction coding system (APC) has a predictor for calculating the prediction coefficient for every frame, and an adaptive quantizer for coding the predicted residual signal which is free from correlation between sampled value. A multi-pulse drive linear prediction coding system (MPEC) excites an LPC synthesis filter with a plurality of pulse sources, and so on.
The prior adaptive prediction coding system (APC) is now described as an example.
FIG. 1A is a block diagram of a prior coder for adaptive prediction coding system, which is shown in U.S. Pat. No. 4,811,396, and UK patent No. 2150377. A digital input speech signal S.sub.j is fed to the LPC analyzer 2 and the short term predictor 6 through the input terminal 1. The LPC analyzer 2 carries out the short term spectrum analysis for every frames according to the digital input speech signal. Resultant LPC parameters thus obtained are coded in the LPC parameter coder 3. The coded LPC parameters are transmitted to a receiver side through a multiplex circuit 30. The LPC parameter decoder 4 decodes the output of the LPC parameter coder 3, and the LPC parameter/short term prediction parameter converter 5 provides the short term prediction parameter, which is applied to the short term predictor 6, the noise shaping filter 19, and the local decoding short term predictor 24.
The subtractor 11 subtracts the output of the short term predictor 6 from the digital input speech signal S.sub.j and provides the short term predicted residual signal .DELTA.S.sub.j which is free from correlation between adjacent samples of the speech signal. The short term predicted residual signal .DELTA.S.sub.j is fed to the pitch analyzer 7 and the long term predictor 10. The pitch analyzer 7 carries out the pitch analysis according to the short term predicted residual signal .DELTA.s.sub.j and provides the pitch period and the pitch parameter which are coded by the pitch parameter coder 8 and are transmitted to a receiver side through the multiplex circuit 30. The pitch parameter decoder 9 decodes the pitch period and the pitch parameter which are the output of the coder 8. The output of the decoder 9 is sent to the long term predictor 10, the noise shaping filter 19 and the local decoding long term predictor 23.
The subtractor 12 subtracts the output of the long term predictor 10, which uses the pitch period and the pitch parameter, from the short term predicted residual signal .DELTA.s.sub.j, and provides the long term predicted residual signal, which is free from the correlation of repetitive waveforms by the pitch of speech signal and ideally is a white noise. The subtractor 17 subtracts the output of the noise shaping filter 19 from the long term predicted residual signal which is the output of the subtractor 12, and provides the final predicted residual signal to the adaptive quantizer 16. The quantizer 16 performs the quantization and the coding of the final predicted residual signal and transmits the coded signal to the receiver side through the multiplex circuit 30.
The coded final predicted residual signal, which is the output of the quantizer 16, is fed to the inverse quantizer 18 for decoding and inverse quantizing. The output of the inverse quantizer 18 is fed to the subtractor 20 and the adder 21. The subtractor 20 subtracts the final predicted residual signal, which is the input of the adaptive quantizer 16, from said quantized final predicted residual signal which is the output of the inverse quantizer 18, and provides the quantization noise, which is fed to the noise shaping filter 19.
In order to update the quantization step size in every sub-frame, the RMS calculation circuit 13 calculates the RMS (root mean square) of said long term predicted residual signal. The RMS coder 14 codes the output of the RMS calculator 13, and stores the coded output level as a reference level along with the adjacent levels made from it. The output of the RMS coder 14 is decoded in the RMS decoder 15. Multiplication of the quantized RMS value corresponding to the reference level as the reference RMS value, by the predetermined fundamental step size makes the step size of the adaptive quantizer 16.
On the other hand, the adder 21 adds the quantized final predicted residual signal which is the output of the inverse quantizer 18, to the output of the local decoding long term predictor 23. The output of the adder 21 is fed to the long term predictor 23 and the adder 22, which also receives the output of the local decoding short term predictor 24. The output of the adder 22 is fed to the local decoding short term predictor 24.
The local decoded digital input speech signal S.sub.j is obtained through the above process on terminal 25.
The subtractor 26 provides the difference between the local decoded digital input speech signal S.sub.j and the original digital input speech signal S.sub.j. The minimum error power detector 27 calculates the power of the error which is the output of the subtractor 26 over the sub-frame period. The similar operation is carried out for all the stored fundamental step sizes, and the adjacent levels. The RMS step size selector 28 selects the coded RMS level and the fundamental step size which provide the minimum power among error powers. The selected step size is coded in the step size coder 29. The output of the step size coder 29 and the selected coded RMS level are transmitted to the receiver side through the multiplexer 30.
FIG. 1B shows a block diagram of a decoder which is used in a prior adaptive prediction coding system on a receiver side.
The input signal at the decoder input terminal 32 is separated in the demultiplexer 33 into each information of the final residual signal (a), an RMS value (b), a step size (c), an LPC parameter (d), and a pitch period/pitch parameter (e). They are fed to the adaptive inverse quantizer 36, the RMS decoder 35, the step size decoder 34, the LPC parameter decoder 38, and the pitch parameter decoder 37, respectively.
The RMS value decoded by the RMS value decoder 35, and the fundamental step size obtained in the step size decoder 34 are set to the adaptive inverse quantizer 36. The inverse quantizer 36 inverse quantizes the received final predicted residual signal, and provides the quantized final predicted residual signal.
The short term prediction parameter obtained in the LPC parameter decoder 38 and the LPC parameter/short term prediction parameter converter 39 is sent to the short term predictor 43 which is one of the synthesis filters, and to the post noise shaping filter 44. Furthermore, the pitch period and the pitch parameter obtained in the pitch parameter decoder 37 are sent to the long term predictor 42, which is the other element of the synthesis filters.
The adder 40 adds the output of the adaptive inverse quantizer 36 to the output of the long term predictor 42, and the sum is fed to the long term predictor 42. The adder 41 adds the sum of the adder 40 to the output of the short term predictor 43, and provides the reproduced speech signal. The output of the adder 41 is fed to the short term predictor 43, and the post noise shaping filter 44 which shapes the quantization noise. The output of the adder 41 is further fed to the level adjuster 45, which adjusts the level of the output signal by comparing the level of the input with that of the output of the post noise shaping filter 44.
The noise shaping filter 19 in the coder, and the post noise shaping filter 44 in the decoder are now described.
FIG. 2 shows a block diagram of the prior noise shaping filter 19 in the coder. The output of the LPC parameter/short term prediction parameter converter 5 is sent to the short term predictor 49, and the pitch parameter and the pitch period which are the outputs of the pitch parameter decoder 9 are sent to the long term predictor 47. The quantization noise which is the output of the subtractor 20 is fed to the long term predictor 47. The subtractor 48 provides the difference between the input of the long term predictor 47 (quantization noise) and the output of the long term predictor 47. The output of the subtractor 48 is fed to the short term predictor 49. The adder 50 adds the output of the short term predictor 49 to the output of the long term predictor 47, and the output of the adder 50 is fed to the subtractor 17 as the output of the noise shaping filter 19.
The transfer function F'(z) of the noise shaping filter 19 is as follows. EQU F'(z)=r.sub.nl P.sub.l (z)+[l-r.sub.nl P.sub.l (z)]P.sub.s (z/(r.sub.s r.sub.ns)) (1)
where P.sub.s (z) and P.sub.l (z) are transfer functions of the short term predictor 6 and the long term predictor 10, respectively, and are given for instance by the equations (2) and (3), respectively, described later. r.sub.s is leakage, r.sub.nl and r.sub.ns are noise shaping factors of the long term predictor and the short term predictor, respectively, and each satisfying 0.ltoreq.r.sub.s, r.sub.nl, r.sub.ns .ltoreq.1. The values of r.sub.nl and r.sub.ns are fixed in a prior noise shaping filter.
The transfer function Ps(z) of the short term predictor 6 is given below. ##EQU1## where a.sub.i is a short term prediction parameter, N.sub.s is the number of taps of a short term predictor. The value a.sub.i is calculated in every frame in the LPC analyzer 2 and the LPC parameter/short term prediction parameter converter 5. The value a.sub.i varies adaptively in every frame depending upon the change of the spectrum of the input signal.
The transfer function of the long term predictor 10 is defined by the similar equation, and the transfer function P.sub.l (z) for one tap predictor is as follows. EQU P.sub.l (z)=b.sub.l z.sup.-(P p.sup.) ( 3)
where b.sub.l is the pitch parameter, P.sub.p is the pitch period. The values b.sub.l and P.sub.p are calculated in every frame in the pitch analyzer 7, and follows adaptively to the change of the periodicity of the input signal.
FIGS. 3A and 3B show block diagrams of the prior post noise shaping filter 44 in the decoder.
In a prior art, only a short term post noise shaping filter which has the weight of the short term prediction parameter in the equation (2) is used.
FIG. 3A shows a post noise shaping filter composed of merely a pole filter. The short term prediction parameter obtained in the LPC parameter/short term prediction parameter converter 39 is set to the short term predictor 52. The adder 51 adds the reproduced speech signal from the adder 41 to the output of the short term predictor 52, and the sum of the adder 51 is fed to the short term predictor 52 and the level adjuster 45. The transfer function F.sub.p.sup.' (z) of the post noise shaping filter including the level adjuster 45 is shown below. ##EQU2## where G.sub.0 is a gain control parameter, r.sub.ps is a shaping factor satisfying 0.ltoreq.r.sub.ps .ltoreq.1.
FIG. 3B shows another post noise shaping filter which has a zero filter together with the structure of FIG. 3A. The short term prediction parameter obtained in the LPC parameter/short term prediction parameter converter 39 is set to the pole filter 54 and the zero filter 55 of the short term predictor. The adder 53 adds the reproduced speech signal from the adder 41 to the output of the pole filter 54, and the sum is fed to the pole filter 54 and the zero filter 55. The subtractor 56 subtracts the output of the zero filter 55 from the output of the adder 53, and the difference is fed to the level adjuster 45.
The transfer function F.sub.po.sup.' (z) of the post noise shaping filter of FIG. 3B including the level adjuster 45 is shown below. ##EQU3## where G.sub.0 is a gain control parameter, r.sub.psz and r.sub.psp are shaping factors of zero and pole filters, respectively, satisfying 0.ltoreq.r.sub.psz .ltoreq.1, and 0.ltoreq.r.sub.psp .ltoreq.1.
The noise shaping filter 19 in a prior coder is based upon a prediction filter which shapes the spectrum of the quantization noise similar to that of a speech signal, and masks the noise by a speech signal so that audible speech quality is improved. It is effective in particular to reduce the influence by quantization noise which exists far from the formant frequencies (in the valleys of the spectrum).
However, it should be appreciated that the spectrum of speech signal fluctuates in time, and thus has a feature depending upon voiced sound or non-voiced sound. A prior noise shaping filter does not depend on the feature of a speech signal, and merely applies fixed shaping factors. Therefore, when the shaping factors are the best for non-voiced sound, the voiced sound is distorted or not clear. On the other hand, when the shaping factors are the best for voiced sound, it does not noise-shape satisfactorily for non-voiced speech. Therefore, a prior fixed shaping factors cannot provide excellent speech quality for both voiced sound and non-voiced sound.
Further, the post noise shaping filter 44 in a prior decoder consists of only a short term predictor which emphasizes the speech energy in the vicinities of formant frequencies (at the peaks of the spectrum), that is, it spread the difference between the level of speech at the peaks and that of noise in the valleys. This is why speech quality is improved by the post noise shaping filter on a frequency domain. A prior post noise shaping filter also takes a fixed weight to a short term prediction filter without considering the feature of the spectrum of a speech signal. Thus, a strong noise-shaping, which is suitable to non-voiced sound, would provide undesirable click or distortion for voiced sound. On the other hand, the noise-shaping suitable for voiced sound is not satisfactory with non-voiced sound. Therefore, the post noise shaping filter with fixed shaping factors can not provide satisfactory speech quality for both voiced sound and non-voiced sound.
Also, on a transmitter side, a prior MPEC system has an weighting filter which determines amplitude and location of a excitation pulse so that the power of the difference between the input speech signal and the reproduced speech signal from a synthesis filter becomes minimum. The weighting filter also has a fixed weighting coefficient. Therefore, similar to the previous reason, it is not possible to obtain satisfactory speech quality for both voiced sound and non-voiced sound.