The present invention relates to a speech coder for efficiently coding speech information and a speech decoder for efficiently decoding the same.
A speech coding technique for efficiently coding and decoding speech information has been developed in recent years. In Code Excited Linear Prediction: xe2x80x9cHigh Quality Speech at Low Bit Ratexe2x80x9d, M. R. Schroeder, Proc. ICASSP ""85, pp. 937-940, there is described a speech coder of a CELP type, which is on the basis of such a speech coding technique.
In this speech coder, a linear prediction for an input speech is carried out in every frame, which is divided at a fixed time. A prediction residual (excitation signal) is obtained by the linear prediction for each frame. Then, the prediction residual is coded using an adaptive codebook in which a previous excitation signal is stored and a random codebook in which a plurality of random code vectors is stored.
FIG. 1 shows a functional block of a conventional CELP type speech coder.
A speech signal 11 input to the CELP type speech coder is subjected to a linear prediction analysis in a linear prediction analyzing section 12. A linear predictive coefficients can be obtained by the linear prediction analysis. The linear predictive coefficients are parameters indicating an spectrum envelop of the speech signal 11. The linear predictive coefficients obtained in the linear prediction analyzing section 12 are quantized by a linear predictive coefficient coding section 13, and the quantized linear predictive coefficients are sent to a linear predictive coefficient decoding section 14. Note that an index obtained by this quantization is output to a code outputting section 24 as a linear predictive code. The linear predictive coefficient decoding section 14 decodes the linear predictive coefficients quantized by the linear predictive coefficient coding section 13 so as to obtain coefficients of a synthetic filter. The linear predictive coefficient decoding section 14 outputs these coefficients to a synthetic filter 15.
An adaptive codebook 17 is one, which outputs a plurality of candidates of adaptive codevectors, and which comprises a buffer for storing excitation signals corresponding to previous several frames. The adaptive codevectors are time series vectors, which express periodic components in the input speech.
A random codebook 18 is one, which stores a plurality of candidates of random codevectors. The random code vectors are time series vectors, which express non-periodic components in the input speech.
In an adaptive code gain weighting section 19 and a random code gain weighting section 20, the candidate vectors output from the adaptive codebook 17 and the random codebook 18 are multiplied by an adaptive code gain read from a weight codebook 21 and a random code gain, respectively, and the resultants are output to an adding section 22.
The weighting codebook stores a plurality of adaptive codebook gains by which the adaptive codevector is multiplied and a plurality of random codebook gains by which the random codevectors are multiplied.
The adding section 22 adds the adaptive code vector candidates and the random code vector candidates, which are weighted in the adaptive code gain weighting section 19 and the random code gain weighting section 20, respectively. Then, the adding section 22 generates excitation vectors so as to be output to the synthetic filter 15.
The synthetic filter 15 is an all-pole filter. The coefficients of the synthetic filter are obtained by the linear predictive coefficient decoding section 14. The synthetic filter 15 has a function of synthesizing input excitation vector in order to produce synthetic speech and outputting that synthetic speech to a distortion calculator 16.
A distortion calculator 16 calculates a distortion between the synthetic speech, which is the output of the synthetic filter 15, and the input speech 11, and outputs the obtained distortion value to a code index specifying section 23. The code index specifying section 23 specifies three kinds of codebook indicies (index of adaptive codebook, index of random codebook, index of weight codebook) so as to minimize the distortion calculated by the distortion calculation section 16. The three kinds of codebook indicies specified by the code index specifying section 23 are output to a code outputting section 24. The code outputting section 24 outputs the index of linear predictive codebook obtained by the linear predictive coefficient coding section 13 and the index of adaptive codebook, the index of random code, the index of weight codebook, which have been specified by the code index specifying section 23, to a transmission path at one time.
FIG. 2 shows a functional block of a CELP speech decoder, which decodes the speech signal coded by the aforementioned coder. In this speech decoder apparatus, a code input section 31 receives codes sent from the speech coder (FIG. 1). The received codes are disassembled into the index of the linear predictive codebook, the index of adaptive codebook, the index of random codebook, and the index of weight codebook. Then, the indicies obtained by the above disassemble are output to a linear predictive coefficient decoding section 32, an adaptive codebook 33, a random codebook 34, and a weight codebook 35, respectively.
Next, the linear predictive coefficient decoding section 32 decodes the linear predictive code number obtained by the code input section 31 so as to obtain coefficients of the synthetic filter, and outputs those coefficients to a synthetic filter 39. Then, an adaptive codevector corresponding to the index of adaptive codebook is read from adaptive codebook, and a random codevector corresponding to the index of random codebook is read from the random codebook. Moreover, an adaptive codebook gain and a-random codebook gain corresponding to the index of weight codebook are read from the weight codebook. Then, in an adaptive codevector weighting section 36, the adaptive codevector is multiplied by the adaptive codebook gain, and the resultant is sent to an adding section 38. Similarly, in a random codevector weighting section 37, the random codevector is multiplied by the random codebook gain,: and the resultant is sent to the adding section 38.
The adding section 38 adds the above two codevectors and generates an excitation vector. Then, the generated excitation vector is sent to the adaptive codebook 33 to update the buffer or the synthetic filter 39 to excite the filter. The synthetic filter 39, composed with the linear predictive coeffcients which are output from linear predictive coefficient decoding section 32, is excited by the excitation vector obtained by the adding section 38, and reproduces a synthetic speech.
Note that, in the distortion calculator 16 of the CELP speech coder, distortion E is generally calculated by the following expression (1):
xe2x80x83E=∥vxe2x88x92(gaHP+gcHC)∥2xe2x80x83xe2x80x83(1)
where v: an input speech signal (vector),
H: an impulse response convolution matrix for a synthetic filter   H  =      [                                        h            ⁡                          (              0              )                                                0                          ⋯                          ⋯                          0                          0                                                  h            ⁡                          (              1              )                                                            h            ⁡                          (              0              )                                                0                          ⋯                          0                          0                                                  h            ⁡                          (              2              )                                                            h            ⁡                          (              1              )                                                            h            ⁡                          (              0              )                                                0                          0                          0                                      ⋮                          ⋮                          ⋮                          ⋰                          0                          0                                      ⋮                          ⋮                          ⋮                          ⋰                                      h            ⁡                          (              0              )                                                0                                                  h            ⁡                          (                              L                -                1                            )                                                ⋯                          ⋯                          ⋯                                      h            ⁡                          (              1              )                                                            h            ⁡                          (              0              )                                            ]  
wherein h is an impulse response of a synthetic filter, L is a frame length,
p: an adaptive codevector,
c: a random codevector,
ga: an adaptive codebook gain
gc: a random codebook gain
Here, in order to minimize distortion E of expression (1), the distortion is calculated by a closed loop with respective to all combinations of the adaptive code number, the random code number, the weight code number, it is necessary to specify each code number.
However, if the closed loop search is performed with respect to expression (1), an amount of calculation processing becomes too large. For this reason, generally, first of all, the index of adaptive codebook is specified by vector quantization using the adaptive codebook. Next, the index of random coodbook is specified by vector quantization using the random codebook. Finally, the index of weight codebook is specified by vector quantization using the weight codebook. Here, the following will specifically explain the vector quantization processing using the random codebook.
In a case where the index of adaptive codebook or the adaptive codebook gain are previously or temporarily determined, the expression for evaluating distortion shown in expression (1) is changed to the following expression (2):
Ec=∥xxe2x88x92gcHC∥2xe2x80x83xe2x80x83(2)
where vector x in expression (2) is random excitation target vector for specifying a random code number which is obtained by the following equation (3) using the previously or temporarily specified adaptive codevector and adaptive codebook gain.
x=vxe2x88x92gaHPxe2x80x83xe2x80x83(3)
where ga: an adaptive codebook gain,
v: a speech signal (vector),
H: an impulse response convolution matrix for a synthetic filter,
p: an adaptive codevector.
For specifying the random codebook gain gc after specifying the index of random codebook, it can be assumed that gc in the expression (2) can be set to an arbitrary value. For this reason, it is known that a quantization processing for specifying the index of the random codebook minimizing the expression (2) can be replaced with the determination of the index of the random codebook vector maximizing the following fractional expression (4):
                                          (                                          x                t                            ⁢              Hc                        )                    2                                      "LeftDoubleBracketingBar"            Hc            "RightDoubleBracketingBar"                    2                                    (        4        )            
In other words, in a case where the index of adaptive codebook and the adaptive codebook gain are previously or temporarily determined, vector quantization processing for random excitation becomes processing for specifying the index of the random codebook maximizing fractional expression (4) calculated by the distortion calculator 16.
In the CELP coder/decoder in the early stages, one that stores kinds of random sequences corresponding to the number of bits allocated in the memory was used as a random codebook. However, there was a problem in which a massive amount of memory capacity was required and the amount of calculation processing for calculating distortion of expression (4) with respect to each random codevector was greatly increased.
As one of methods for solving the above problem, there is a CELP speech coder/decoder using an algebraic excitation vector generator for generating an excitation vector algebraically as described in xe2x80x9c8 KBIT/S ACELP CODING OF SPEECH WITH 10 MS SPEECH-FRAME: A CANDIDATE FOR CCITT STANDARDIZATIONxe2x80x9d: R. Salami, C. Laflamme, J-P. Adoul, ICASSP ""94, pp.II-97xcx9cII-100, 1994.
However, in the above CELP speech coder/decoder using an algebraic excitation vector generator, random excitation (target vector for specifying an index of random codebook) obtained by equation (3) is approximately expressed by a few signed pulses. For this reason, there is a limitation in improvement of speech quality. This is obvious from an actual investigation of an element for random excitation x of expression (3) wherein there are few cases in which random excitations are composed only of a few signed pulses.
An object of the present invention is to provide an excitation vector generator, which is capable of generating an excitation vector whose shape has a statistically high similarity to the shape of a random excitation obtained by analyzing an input speech signal.
Also, an object of the present invention is to provide a CELP speech coder/decoder, a speech signal communication system, a speech signal recording system, which use the above excitation vector generator as a random codebook so as to obtain a synthetic speech having a higher quality than that of the case in which an algebraic excitation vector generator is used as a random codebook.
A first aspect of the present invention is to provide an excitation vector generator comprising a pulse vector generating section having N channels (Nxe2x89xa71) for generating pulse vectors each having a signed unit pulse provided to one element on a vector axis, a storing and selecting section having a function of storing M (Mxe2x89xa71)kinds of dispersion patterns every channel and a function of selecting a certain kind of dispersion pattern from M kinds of dispersion patterns stored, a pulse vector dispersion section having a function of convolving the dispersion pattern selected from the dispersion pattern storing and selecting section to the signed pulse vector output from the pulse vector generator so as to generator N dispersed vectors, and a dispersed vector adding section having a function of adding N dispersed vectors generated by the pulse vector dispersion sect-ion so as to generate an excitation vector. The function for algebraically generating (Nxe2x89xa71) pulse vectors is provided to the pulse vector generator, and the dispersion pattern storing and selecting section stores the dispersion patterns obtained by pre-training the shape (characteristic) of the actual vector, whereby making it possible to generate the excitation vector, which is well similar to the shape of the actual excitation vector as compared with the conventional algebraic excitation generator.
Moreover, the second aspect of the present invention is to provide a CELP speech coder/decoder using the above excitation vector generator as the random codebook, which is capable of generating the excitation vector being closer to the actual shape than the case of the conventional speech coder/decoder using the algebraic excitation generator as the random codebook. Therefore, there can be obtained the speech coder/decoder, speech signal communication system, and speech signal recording system, which can output the synthetic speech having a higher quality.