When a speech signal is transmitted in a packet communication system represented by Internet communication or mobile communication system, a compression/coding technology is often used to enhance transmission efficiency of the speech signal. Many speech coding systems have been developed so far, and many lowbit rate speech coding systems developed in recent years separate a speech signal into a spectral envelope information and a sound source information and compress/code the separated information pieces. For example, a CELP system described in Document 1 (M. R. Schroeder, B. S. Atal, “Code Excited Linear Prediction: High Quality Speech at Low Bit Rate”, IEEE proc., ICASSP'85 pp.937-940) is one of its examples.
Here, an overview of a CELP-based speech coder will be explained using FIG. 1. Suppose an input speech signal is input to a speech coder successively every processing frame delimited by a time interval of approximately 20 ms.
The input speech signal input to the speech coder for every processing frame is supplied to an LPC analysis section 11 first. The LPC analysis section 11 carries out an LPC (Linear Predictive Coding) analysis on the input speech signal, obtains an LPC vector having LPC coefficients as vector components, vector-quantizes the LPC vector obtained to obtain an LPC code, and decodes this LPC code to obtain a decoded LPC vector having decoded LPC coefficients as vector components.
An excitation vector generation section 14 reads an adaptive codevector and fixed codevector from an adaptive codebook 12 and a fixed codebook 13 respectively and sends those codevectors to an LPC synthesis filter 15. The LPC synthesis filter 15 performs synthesis filtering on the adaptive codevector and the fixed codevector supplied from the excitation vector generation section 14 using an all-pole model synthesis filter having the decoded LPC coefficients given from the LPC analysis section 11 as filter coefficients and obtains a synthesized adaptive codevector and a synthesized fixed codevector, respectively.
A comparison section 16 analyzes a relationship between the synthesized adaptive codevector, the synthesized fixed codevector output from the LPC synthesis filter 15 and the input speech signal, and calculates an adaptive codebook optimum gain to be multiplied on the synthesized adaptive codevector and a fixed codebook optimum gain to be multiplied on the synthesized fixed codevector, respectively. Furthermore, the comparison section 16 adds up the vector obtained by multiplying the synthesized adaptive codevector by the adaptive codebook optimum gain and the vector obtained by multiplying the synthesized fixed codevector by the fixed codebook optimum gain to obtain a synthesized speech vector and calculates a distortion between the synthesized speech vector obtained and input speech signal.
The comparison section 16 further calculates distortions between many synthesized speech vectors obtained by operating the excitation vector generation section 14 and LPC synthesis filter 15 on all possible combinations of adaptive codevectors stored in the adaptive codebook 12 and fixed codevectors stored in the fixed codebook 13, and the input speech signal, determines an index of an adaptive codevector and an index of a fixed codevector that minimize the distortions from among those codevectors and sends the indices of the codevectors output from the respective codebooks, codevectors corresponding to the indices and an adaptive codebook optimum gain and fixed codebook optimum gain corresponding to the indices to a parameter coding section 17.
The parameter coding section 17 codes the adaptive codebook optimum gain and fixed codebook optimum gain to obtain gain codes, and outputs the gain codes obtained, the LPC code given from the LPC analysis section 11 and the indices of the respective codebooks together for each processing frame.
The parameter coding section 17 further adds up two vectors; a vector obtained by multiplying the adaptive codevector corresponding to the index of the adaptive codebook by an adaptive codebook gain corresponding to the gain code and a vector obtained by multiplying the fixed codevector corresponding to the index of the fixed codebook by a fixed codebook gain corresponding to the gain code, thereby obtains an excitation vector and updates the old adaptive codevector in the adaptive codebook 12 with the excitation vector obtained.
For synthesis filtering by the LPC synthesis filter 15, it is a general practice that linear predictive coefficients, high-pass filter and perceptual weighting filter using a long-term predictive coefficient obtained by carrying out a long-term predictive analysis on the input speech are used together. It is also a general practice that a search for optimum indices of the adaptive codebook and fixed codebook, calculation of optimum gains and coding processing of optimum gains are carried out in units of a subframe obtained by subdividing a frame.
Next, an overview of processing of “vector quantization of LPC vector” carried out by the LPC analysis section 11 will be explained in more detail using FIG. 2. Suppose that an LPC codebook 22 stores a plural entries of typical LPC vectors acquired beforehand by applying the LBG algorithm to many LPC vectors obtained by actually carrying out an LPC analysis on input speech signals of many processing frames. With regard to the LBG algorithm, the details of its technology are disclosed in Document 2 (Y. Linde, A. Buzo, R. M. Gray, “An Algorithm for Vector Quantizer Design,” IEEE trans. Comm., Vol. COM-28, No. 1, pp84-95, January, 1980).
A quantization target vector input to the vector quantizer in FIG. 2 (an LPC vector obtained by carrying out an LPC analysis on a speech signal in a processing frame section corresponds to the quantization target) is supplied to a distortion calculation section 21. Next, the distortion calculation section 21 calculates a Euclidean distortion between an LPC codevector stored in the LPC codebook 22 and the quantization target vector according to the following Expression (1):
                              d          m                =                              ∑                          i              =              1                        N                    ⁢                                          ⁢                                    (                                                                    X                    T                                    ⁡                                      (                    i                    )                                                  -                                                      C                    m                                    ⁡                                      (                    i                    )                                                              )                        2                                              Expression  (1)            where in Expression (1), XT is a quantization target vector, Cm is an mth (1≦m≦M) LPC codevector in the LPC codebook, i is a component number of a vector, N is the order of a vector (corresponds to an LPC analysis order) and dm is a Euclidean distortion between XT and Cm.
The distortion calculation section 21 successively calculates Euclidean distortions between all LPC codevectors stored in the LPC codebook 22 and the quantization target vector, then successively outputs the calculation results(respective Euclidean distortions) to an LPC codebook search section 23. The LPC codebook search section 23 compares the respective Euclidean distortions supplied from the distortion calculation section 21 and outputs an index of an LPC codevector that minimizes the Euclidean distortion as an LPC code (coded expressing spectral envelope information on the processing frame).
On the other hand, it is possible to obtain decoded LPC coefficients (decode LPC coefficients) by reading out the LPC codevector corresponding to the index indicated by the LPC code from the LPC codebook. By the way, since the processing of generating decoded LPC coefficients, which are used for constituting an all-pole model LPC synthesis filter, from the LPC code is generally carried out by both the speech coder and speech decoder.
In many speech coders/decoders developed in recent years, LPC vector is not quantized as it is, and it is a general practice that an LPC vector is converted to an LSF (Line Spectral Frequency) vector having LSF parameters as vector components or an LSP (Line Spectral Pairs) vector having LSP parameters as vector components, which are one-to-one mutually convertible frequency domain vectors, and then vector-quantized. This is because vector-quantizing the LPC vector after converting it to a vector in the frequency domain rather than directly vector-quantizing the LPC vector in time domain has higher quantization efficiency and higher interpolation characteristic. By the way, features of the LSF (or LSP) vector and a method for mutual conversion with the LPC vector are disclosed in Document 3 (F. Itakura, “Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals,” J. Acoust. Soc. Amer., vol 57, p.S35, April 1975) or Document 4 (L. K. Paliwal and B. S. Atal, “Efficient Vector Quantization of LPC Parameter at 24 Bits/Frame,” IEEE trans. on Speech and Audio Processing, vol. 1, pp. 3-14, January 1993).
For example, when the LSF vector is quantized, an LSF vector LSFT[i] (i=1, . . . , N) in the frequency domain obtained by converting the LPC vector, is input to the vector quantizer as the quantization target vector. In this case LPC codebook stores candidate LSF codevectors LSFm [i] (i=1, . . . , N) each vector having LSF parameters as vector components, it is possible to vector-quantize the target LSF vector using the same procedure as that when the target LPC vector is vector-quantized. However, when LSF (or LSP) vector is quantized, the weighted Euclidean distortion dm in Expression (2) below instead of Expression (1) above is often used as a measure for an LPC codebook search.
      d    m    =            ∑              i        =        1            N        ⁢                  ⁢                  [                              w            ⁡                          (              i              )                                *                      (                                                            LSF                  T                                ⁡                                  (                  i                  )                                            -                                                LSF                  m                                ⁡                                  (                  i                  )                                                      )                          ]            2      
The weighted Euclidean distortion is disclosed in detail, for example, in Document 4 or Document 5 (A. Kataoka, T. Moriya and S. Hayashi, “An 8-kb/s Conjugate Structure CELP (CS-CELP) Speech Coder,” IEEE trans. Speech and Audio Processing, vol. 4, No. 6, pp.401-411, November 1996) or Document 6 (R. Hagen, E. Paksoy, and A. Gersho, “Voicing-Specific LPC Quantization for Variable-Rate Speech Coding,” IEEE trans. Speech and Audio Processing, vol. 7, no. 5, pp.485-494, September, 1999).
By the way, it is possible to obtain decoded LSF parameters (decode LSF parameters) by reading out the LSF codevector corresponding to the index indicated by the LPC code from the LPC codebook by using the same manner as that for obtaining decoded LPC coefficients from LPC codes, that is, reading out a decoded LPC codevector corresponding to an LPC code from a codebook. In this case, however, the decoded LSF parameters read based on the LPC code are parameters in the frequency domain. Thus, additional processing for converting the decoded LSF parameters in the frequency domain to decoded LPC coefficients in the time domain for constructing an all-pole model LPC synthesis filter is required.
With regard to a speech coder/decoder according to a CELP system, etc., LPC parameters representing short-time spectral envelope information of a speech signal (hereinafter LPC coefficients and parameters such as LSF which are mutually convertible with LPC coefficients will be generically referred to as “LPC parameters”) are generally compressed/coded by a vector quantizer. However, when a vector quantizer in a simple configuration as shown in FIG. 2 is applied as is, quantization distortion generated by each processing frame will increase, failing to obtain preferable synthesized speech. For this reason, a lot of researches such as “predictive vector quantization technology”, “multistage vector quantization technology” and “split vector quantization technology” have been made so far for improving the vector quantizer performance. In order to design a high performance vector quantizer, it is indispensable to use many of these technologies in combination.
By the way, when a vector quantizer of LPC vector is newly designed (or improved), an evaluation measure to compare/evaluate the performance of the quantizer is required. When evaluating the performance, it is preferable to use an evaluation measure considering that the LPC parameters are originally the parameters to express short-time spectral envelope information of a speech signal. Thus, CD (Cepstral Distortion) measure in Expression (3) below which evaluates distortion in the LPC cepstrum domain corresponding to an LPC spectrum model or SD (Spectral Distortion) measure in Expression (4) below which evaluates distortion in an FFT (Fast Fourier Transformation) spectral domain is often used as a performance evaluation measure:
                    CD        =                                            1              L                        ⁢                          {                                                ∑                                      i                    =                    1                                    L                                ⁢                                  CD                                      (                    l                    )                                                              }                                =                                    1              L                        ⁢                          {                                                ∑                                      i                    =                    1                                    L                                ⁢                                                      10                                          log                      10                                                        ⁢                                                            2                      ⁢                                                                        ∑                                                      i                            =                            1                                                                                N                            c                                                                          ⁢                                                                                                  ⁢                                                                              (                                                                                                                            CEP                                  t                                                                      (                                    l                                    )                                                                                                  ⁡                                                                  [                                  i                                  ]                                                                                            -                                                                                                CEP                                  q                                                                      (                                    l                                    )                                                                                                  ⁡                                                                  [                                  i                                  ]                                                                                                                      )                                                    2                                                                                                                                }                                                          Expression  (3)            where in Expression (3), L is the number of data frames used for evaluation, l is a frame number, Nc is the order of an LPC cepstrum (when the LPC analysis order N is the 10th order, Nc is often on the order of the 16th order), CEPt(l)[i] is a target LPC cepstrum obtained by converting a quantization target of the first processing frame and CEPq(l)[i] is LPC cepstrum obtained by converting decoded LPC vector of the first processing frame. The technological details of the features of the LPC cepstrum and method of mutual conversion between LPC vector and LPC cepstrum are disclosed, for example, in Document 7 (M R. Shroeder, “Direct (Nonrecursive) Relations Between Cepstrum and Predictor Coefficients, “IEEE trans. on vol. ASSP-29, No.2, pp.297-301, April, 1981.).
                    SD        =                                            1              L                        ⁢                          {                                                ∑                                      i                    =                    1                                    L                                ⁢                                  SD                                      (                    l                    )                                                              }                                =                                    1              L                        ⁢                          {                                                ∑                                      i                    =                    1                                    L                                ⁢                                                      10                                          log                      10                                                        ⁢                                                                                                                                   4                          K                                                ⁢                                                                              ∑                                                          j                              =                              1                                                                                      K                              /                              2                                                                                ⁢                                                                                                          ⁢                                                                                    (                                                                                                                                    log                                    10                                                                    ⁡                                                                      [                                                                                                                  SP                                        t                                                                                  (                                          l                                          )                                                                                                                    ⁡                                                                              (                                                                                  ω                                          j                                                                                )                                                                                                              ]                                                                                                  -                                                                                                      log                                    10                                                                    ⁡                                                                      [                                                                                                                  SP                                        t                                                                                  (                                          l                                          )                                                                                                                    ⁡                                                                              (                                                                                  ω                                          j                                                                                )                                                                                                              ]                                                                                                                              )                                                        2                                                                                                                                                                               ⁢                                                          }                                                          Expression  (4)            where in Expression (4), L is the number data frames used for evaluation, l is a frame number, K is the number of FFT points, SPt(l)(ωj) is an FFT power spectrum of a quantization target of the lth processing frame, SPq(l)(ωj) is an FFT power spectrum of a decoded LPC vector of the l-th processing frame and ωj=2πj/K. The technological details of the features of SD are disclosed, for example, in Document 4 above.
Both CD in Expression (3) and SD in Expression (4) are obtained by adding up quantization distortion generated in each processing frame throughout the evaluation data and then averaging the addition result by the number of data frames in the evaluation data, which means that the smaller the CD or SD, the higher the performance of the vector quantizer.
When an LPC vector is vector-quantized, a Euclidean distortion Expression (1) or weighted Euclidean distortion Expression (2) is used as a reference measure for a LPC codebook search. On the other hand, the performance of the LPC vector quantizer is generally evaluated using CD described in Expression (3) or SD described in Expression (4) as a performance evaluation measure. That is, in LPC vector quantizers developed so far, a reference measure used for LPC codebook search is different from a reference measure used for evaluating the vector quantizer performance. For this reason, the LPC code selected by LPC codebook search is not always an index for minimizing CD or SD measure. This causes a problem in designing a high performance vector quantizer.
As the simplest method for solving the problem above, it may be reasonable to convert candidate LPC vectors to mutually convertible LPC cepstrums (or FFT power spectrums) and store them in a codebook beforehand, then converting an target LPC vector input in every frame to a target LPC cepstrum (or a target FFT power spectrum) and selecting an LPC cepstrum codevector (or FFT power spectrum codevector) using CD (or SD) as a distortion measure. However, the above solution method causes a drastic increase of the memory capacity for storing candidate codevectors. Moreover, when a vector quantizer is conceived which uses “predictive vector quantization technology” or “multistage vector quantization technology” frequently used in a low bit rate speech coding system, it is necessary to store vectors with no mutual convertibility with an LPC cepstrum (for example, predictive residual vector or quantization error vector) in a codebook beforehand, and therefore the above solution method cannot be employed.