A method of encoding a speech signal by separating the speech signal into a linear prediction filter and its driving excitation signal (excitation signal, excitation vector) is used widely as a method of encoding a speech signal efficiently at medium to low bit rates. One such method that is typical is CELP (Code-Excited Linear Prediction). With CELP, a linear prediction filter for which linear prediction coefficients representing the frequency characteristic of input speech have been set is driven by an excitation signal (excitation vector) represented by the sum of a pitch signal (pitch vector), which represents the pitch period of speech, and a sound source signal (sound source vector) comprising a random number or a pulse train, whereby there is obtained a synthesized speech signal (reconstructed signal, reconstructed vector). At this time the pitch signal and the sound source signal are multiplied by respective gains (pitch gain and sound source gain). For a discussion of CELP, see the paper (referred to as “Reference 1”) “Code excited linear prediction: High quality speech at very low bit rates” by M. Schroeder et. al (Proc. of IEEE Int. Conf. on Acoust., Speech and Signal Processing, pp. 937-940, 1985).
Mobile communication such as by cellular telephone requires good quality in a noisy environment typified by the congestion of busy streets and by the interior of a traveling automobile. A problem with CELP-based speech encoding is a marked decline in sound quality for speech on which noise has been superimposed (such speech will be referred to as “background-noise speech” below).
A method of smoothing the gain of a sound source in a decoder is an example of a known technique for improving the encoded speech quality of background-noise speech. In accordance with this method, a temporal change in short-term average power of a sound source signal that has been multiplied by the aforesaid sound source gain is smoothed by smoothing the sound source gain. As a result, a temporal change in short-term average power of the excitation signal also is smoothed. This method improves sound quality by reducing extreme fluctuation in short-term average power in decoded noise, which is one cause of degraded sound quality.
With regard to a method of smoothing the gain of a sound source signal, see Section 6.1 of “Digital Cellular Telecommunication System; Adaptive Multi-Rate Speech Transcoding” (ETSI Technical Report, GSM 06.90 version 2.0.0) (Referred to as “Reference 2”).
FIG. 8 is a block diagram illustrating an example of the structure of a conventional speech signal decoder which improves the encoded quality of background-noise speech by smoothing the gain of a sound source signal. It is assumed here that input of a bit sequence occurs in a period (frame) of Tfr msec (e. g., 20 ms) and that computation of a reconstructed vector is performed in a period (subframe) of Tfr/Nsfr msec (e. g., 5 ms), where Nsfr is an integer (e. g., 4). Let frame length be Lfr samples (e. g., 320 samples) and let subframe length be Lsfr samples (e. g., 80 samples). The numbers of these samples is decided by the sampling frequency (e. g., 16 kHz) of the input speech signal.
The components of the conventional speech signal decoder will be described with reference to FIG. 8.
The code of the bit sequence enters from an input terminal 10. A code input circuit 1010 splits the code of the bit sequence that has entered from the input terminal 10 and converts it to indices that correspond to a plurality of decode parameters. An index corresponding to a line spectrum pair (LSP) which represents the frequency characteristic of the input signal is output to an LSP decoding circuit 1020, an index corresponding to a delay Lpd that represents the pitch period of the input signal is output to a pitch signal decoding circuit 1210, an index corresponding to a sound source vector comprising a random number or a pulse train is output to sound source signal decoding circuit 1110, an index corresponding to a first gain is output to a first gain decoding circuit 1220, and an index corresponding to a second gain is output to a second gain decoding circuit 1120.
The LSP decoding circuit 1020 has a table (not shown) in which multiple sets of LSPs have been stored. The LSP decoding circuit 1020 receives as an input the index that is output from the code input circuit 1010, reads the LSP that corresponds to this index out of the table and obtains LSP ^qj(Nsfr)(n) in the Nsfrth subframe of the present frame (the nth frame), where Np represents the degree of linear prediction.
The LSP of an (Nsfr−1)th subframe from the first subframe is obtained by linearly interpolating ^qj(Nsfr)(n) and Ssfr(i) (where i=0, . . . , Lsf).
LSP ^qj(Nsfr)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) is output to a linear prediction coefficient conversion circuit 1030 and to a smoothing coefficient calculation circuit 1310.
The linear prediction coefficient conversion circuit 1030 receives as an input a signal output from the LSP ^qj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) decoding circuit 1020.
The linear prediction coefficient conversion circuit 1030 converts the entered LSP ^qj(m)(n) to a linear prediction coefficient ^αj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) and outputs ^αj(m)(n) to a synthesis filter 1040. A known method such as the one described in Section 5.2.4 of Reference 2 is used to convert the LSP to a linear prediction coefficient.
The sound source signal decoding circuit 1110 has a table (not shown) in which a plurality of sound source vectors have been stored. The sound source signal decoding circuit 1110 receives as an input the index that is output from the code input circuit 1010, reads the sound source vector that corresponds to this index out of the table and outputs this vector to a second gain circuit 1130.
The second gain decoding circuit 1120 has a table (not shown) in which a plurality of gains have been stored. The second gain decoding circuit 1120 receives as an input the index that is output from the code input circuit 1010, reads a second gain that corresponds to this index out of the table and outputs this gain to a smoothing circuit 1320.
The second gain circuit 1130, which receives as inputs the first sound source vector output from the sound source signal decoding circuit 1110 and the second gain output from the smoothing circuit 1320, multiplies the first sound source vector by the second gain to generate a second sound source vector and outputs the second sound source vector to an adder 1050.
A memory circuit 1240 holds an excitation vector input thereto from the adder 1050. The memory circuit 1240, which holds the excitation vector applied to it in the past, outputs the vector to a pitch signal decoding circuit 1210.
The pitch signal decoding circuit 1210 receives as inputs the past excitation vector held by the memory circuit 1240 and the index output from the code input circuit 1010. The index specifies a delay Lpd. In regard to this past excitation vector, the pitch signal decoding circuit 1210 cuts vectors of Lsfr samples corresponding to the vector length from a point Lpd samples previous to the starting point of the present frame and generates a first pitch signal (vector). In case of ^αj(m)(n), the pitch signal decoding circuit 1210 cuts out vectors of Lpd samples, repeatedly connects the Lpd samples and generates a first pitch vector, which is a sample of vector length Lsfr. The pitch signal decoding circuit 1210 outputs the first pitch vector to a first gain circuit 1230.
The first gain decoding circuit 1220 has a table (not shown) in which a plurality of gains have been stored. The first gain decoding circuit 1220 receives as an input the index that is output from the code input circuit 1010, reads a first gain that corresponds to this index out of the table and outputs this gain to the first gain circuit 1230.
The first gain circuit 1230, which receives as inputs the first pitch vector output from the pitch signal decoding circuit 1210 and the first gain output from the first gain decoding circuit 1220, multiplies the entered first pitch vector by the first gain to generate a second pitch vector and outputs the generated second pitch vector to the adder 1050.
The adder 1050, to which the second pitch vector output from the first gain circuit 1230 and the second sound source vector output from the second gain circuit 1130 are input, adds these inputs and outputs the sum to the synthesis filter 1040 as an excitation vector.
The smoothing coefficient calculation circuit 1310, to which LSP ^qj(m)(n) output from the LSP decoding circuit 1020 is input, calculates an average LSP {overscore ( )}q0j(n) in the nth frame in accordance with Equation (1) below.{circumflex over (q)}0j(n)=0.84·{overscore (q)}0j(n−1)+0.16·{circumflex over (q)}0j(Nsfr)(n)  (1)
Next, with respect to each subframe m, the smoothing coefficient calculation circuit 1310 calculates the amount of fluctuation d0(m) of the LSP in accordance with Equation (2) below.                                           d            0                    ⁡                      (            m            )                          =                              ∑                          j              =              1                                      N              0                                ⁢                                                                                                                              q                      _                                                              0                      ⁢                      j                                                        ⁡                                      (                    n                    )                                                  -                                                                            q                      ^                                        j                                          (                      m                      )                                                        ⁡                                      (                    n                    )                                                                                                                                      q                  _                                                  0                  ⁢                  j                                            ⁡                              (                n                )                                                                        (        2        )            
A smoothing coefficient k0(m) in the subframe m is calculated in accordance with Equation (3) below.k0(m)=min (0.25, max (0, d0(m)−0.4))/0.25  (3)where min(x, y) is a function in which the smaller of x and y is taken as the value and max(x, y) is a function in which the larger of x and y is taken as the value. The smoothing coefficient calculation circuit 1310 finally outputs the smoothing coefficient k0(m) to the smoothing circuit 1320.
The smoothing coefficient k0(m) output from the smoothing coefficient calculation circuit 1310 and the second gain output from the second gain decoding circuit 1120 are input to the smoothing circuit 1320. The latter then calculates an average gain {overscore ( )}g0(m) in accordance with Equation (4) below from second gain ^g0(m) in subframe m.                                                         g              _                        0                    ⁡                      (            m            )                          =                              1            5                    ⁢                                    ∑                              i                =                0                            4                        ⁢                                                            g                  ^                                0                            ⁡                              (                                  m                  -                  i                                )                                                                        (        4        )            
Next, second gain ^g0(m) is substituted in accordance with Equation (5) below.ĝ0(m)=ĝ0·k0(m)+{overscore (g)}0(m)·(1−k0(m))  (5)
Finally the smoothing circuit 1320 outputs the second gain ^g0(m) to the second gain circuit 1130.
The excitation vector output from the adder 1050 and the linear prediction coefficient ^αj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) output from the linear prediction coefficient conversion circuit 1030 are input to the synthesis filter 1040. The latter drives a synthesis filter 1/A(z), for which the linear prediction coefficients have been set, by the excitation vector to thereby calculate the reconstructed vector, which is output from an output terminal 20. The transfer function 1/A(z) of the synthesis filter is represented by Equation (6) below, where it is assumed that the linear prediction coefficient is represented by αi (i=1, . . . , Np).                               1          /                      A            ⁡                          (              z              )                                      =                  1          /                      (                          1              -                                                ∑                                      i                    =                    1                                                        N                    0                                                  ⁢                                                      α                    i                                    ⁢                                      z                    i                                                                        )                                              (        6        )            
FIG. 9 is a block diagram illustrating the structure of a speech signal encoder in a conventional speech signal encoding/decoding apparatus. The speech signal encoder will be described with reference to FIG. 9. It should be noted that the first gain circuit 1230, the second gain circuit 1130, the adder 1050 and the memory circuit 1240 are the same as those described in connection with the speech signal decoding apparatus shown in FIG. 8 and need not be described again.
The encoder has an input terminal 30 to which an input signal (input vector) is applied, the input vector being generated by sampling a speech signal and combining a plurality of samples into one vector as one frame.
The input vector from the input terminal 30 is applied to a linear prediction coefficient calculation circuit 5510, which proceeds to subject the input vector to linear prediction analysis and obtain linear prediction coefficients. A known method of performing linear prediction analysis is described in Chapter 8 “Linear Predictive Coding of Speech” in L. R. Rabiner et. al “Digital Processing of Speech Signals” (Prentice-Hall, 1978) (referred to as “Reference 3”).
The linear prediction coefficient calculation circuit 5510 outputs the linear prediction coefficients to an LSP conversion/quantization circuit 5520.
Upon receiving the linear prediction coefficients output from the linear prediction coefficient calculation circuit 5510, the LSP conversion/quantization circuit 5520 converts the linear prediction coefficients to an LSP and quantizes the LSP to obtain a quantized LSP. An example of a well-known method of converting linear prediction coefficients to an LSP is that described in Section 5.2.3 of Reference 2. An example of a method of quantizing an LSP is that described in Section 5.2.5 of Reference 2.
As described in connection with the LSP decoding circuit of FIG. 8, the quantized LSP is assumed to be a quantized LSP ^qj(Nsfr)(n) in the Nsfrth subframe of the present frame (the nth frame) (where j=1, . . . Np).
The quantized LSP of an (Nsfr−1)th subframe from the first subframe is obtained by linearly interpolating ^qj(Nsfr)(n) and Ssfr(i) (where j=1, . . . , Lsf). Furthermore, this LSP is assumed to be LSP qj(Nsfr)(n) (j=1, . . . Np) in the Nsfrth subframe of the present frame (the nth frame). The LSP of the (Nsfr−1)th subframe from the first subframe is obtained by linearly interpolating qj(Nsfr)(n) and qj(Nsfr)(n−1).
The LSP conversion/quantization circuit 5520 outputs LSPqj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) and the quantized LSP ^qj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) to a linear prediction coefficient conversion circuit 5030 and outputs an index corresponding to the quantized LSP ^qj(Nsfr)(n) (where j=1, . . . , Np) to a code output circuit 6010.
The LSP qj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) and the quantized LSP ^qj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) output from the LSP conversion/quantization circuit 5520 are input to the linear prediction coefficient conversion circuit 5030, which proceeds to convert qj(m)(n) to a linear prediction (LP) coefficient αj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr), convert αj(m)(n) to a linear prediction coefficient ^αj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr), output the linear prediction coefficient αj(m)(n) to a weighting filter 5050 and to a weighting synthesis filter 5040, and output the linear prediction coefficient ^αj(m)(n) to the weighting synthesis filter 5040.
An example of a well-known method of converting an LSP to linear prediction (LP) coefficients and converting a quantized LSP to quantized linear prediction coefficients is that described in Section 5.2.4 of Reference 2.
The input vector from the input terminal 30 and the linear prediction coefficients from the linear prediction coefficient conversion circuit 5030 are input to the weighting filter 5050. The latter uses these linear prediction coefficients to produce a weighting filter W(z) corresponding to the characteristic of the human sense of hearing and drives this weighting filter by the input vector, whereby there is obtained a weighted input vector. The weighted input vector is output to subtractor 5060. The transfer function W(z) of the weighting filter is represented by Equation (7) below.W(z)=Q(z/r1)/Q(z/r2)  (7)where the following holds.                                           Q            ⁡                          (                              z                /                                  r                  1                                            )                                =                      1            -                                          ∑                                  i                  =                  1                                                  N                  0                                            ⁢                                                α                  i                                      (                    m                    )                                                  ⁢                                  r                  1                  i                                ⁢                                  z                  i                                                                    ⁢                                  ⁢                              Q            ⁡                          (                              z                /                                  r                  2                                            )                                =                      1            -                                          ∑                                  i                  =                  1                                                  N                  0                                            ⁢                                                α                  i                                      (                    m                    )                                                  ⁢                                  r                  2                  i                                ⁢                                  z                  i                                                                                        (        8        )            Here r1 and r2 represent constants, e. g., r1=0.9, r2=0.6. Refer to Reference 1, etc., for the details of the weighting filter.
The excitation vector output from the adder 1050 and the linear prediction coefficient αj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) and the linear prediction coefficient ^αj(m)(n) (where j=1, . . . , Np, m=1, . . . , Nsfr) output from the linear prediction coefficient conversion circuit 5030 are input to the weighting synthesis filter 5040.
The weighting synthesis filter 5040 drives the weighting synthesis filter for which αj(m)(n), α^j(m)(n) have been set, namelyH(z)W(z)=Q(z/r1)/[A(z)Q(z/r2)]  (9)by the above-mentioned excitation vector, whereby a weighted reconstructed vector is obtained.
The transfer function H(Z)=1/A(z) of the synthesis filter is represented by Equation (10) below.                               1          /                      A            ⁡                          (              z              )                                      =                  1          /                      (                          1              -                                                ∑                                      i                    =                    1                                                        N                    0                                                  ⁢                                                                            α                      ^                                        i                                          (                      m                      )                                                        ⁢                                      z                    i                                                                        )                                              (        10        )            
The weighted input vector output from the weighting filter 5050 and the weighted reconstructed vector output from the weighting synthesis filter 5040 are input to the subtractor 5060. The latter calculates the difference between these vectors and outputs the difference to a minimizing circuit 5070 as a difference vector.
The minimizing circuit 5070 successively outputs indices corresponding to all sound source vectors that have been stored in a sound source signal generating circuit 5110 to the sound source signal generating circuit 5110, successively outputs indices corresponding to all delays Lpd within a range stipulated in a pitch signal generating circuit 5210 to the pitch signal generating circuit 5210, successively outputs indices corresponding to all first gains that have been stored in a first gain generating circuit 6220 to the first gain generating circuit 6220, and successively outputs indices corresponding to all second gains that have been stored in a second gain generating circuit 6120 to the second gain generating circuit 6120.
Further, difference vectors output from the subtractor 5060 successively enter the minimizing circuit 5070. The latter calculates the norms of these vectors, selects a sound source vector, a delay Lpd, a first gain and a second gain that will minimize the norms and outputs indices corresponding to these to the code output circuit 6010. The indices output from the minimizing circuit 5070 successively enter the pitch signal generating circuit 5210, the sound source signal generating circuit 5110, the first gain generating circuit 6220 and the second gain generating circuit 6120.
With the exception of wiring (connections) relating to input and output, the pitch signal generating circuit 5210, the sound source signal generating circuit 5110, the first gain generating circuit 6220 and the second gain generating circuit 6120 are identical with the pitch signal decoding circuit 1210, the sound source signal decoding circuit 1110, the first gain decoding circuit 1220 and the second gain decoding circuit 1120 shown in FIG. 8. Accordingly, these circuits need not be explained again.
The index corresponding to the quantized LSP output from the LSP conversion/quantization circuit 5520 is input to the code output circuit 6010, and so are the indices, which are output from the minimizing circuit 5070, corresponding to the sound source vector, the delay Lpd, the first gain and the second gain. The code output circuit 6010 converts these indices to the code of a bit sequence and outputs the code from an output terminal 40.