The present invention relates to speech coders and, more particularly, to speech coders for high quality coding of speech signals at low bit rates.
A speech coder is used together with a speech decoder such that the speech is coded by the coder and decoded in the speech decoder. A well known method of high efficiency speech coding is CELP (Code Excited Linear Prediction coding) as disclosed in, for instance, M. Schroeder, B. Atal et al, xe2x80x9cCode-Excited Linear Prediction: High Quality Speech at very low bit ratesxe2x80x9d, IEEE Proc. ICASSP-85, 1985, pp. 937-940 (Reference 1) and Kleijn et al, xe2x80x9cImproved Speech Quality and Efficient Vector Quantization in SELPxe2x80x9d, IEEE Proc. ICASSP-88, 1988, pp. 155-158 (Reference 2). In this method, on the transmission side, a spectral parameter, representing a spectral energy distribution of a speech signal, is extracted from the speech signal for each frame (of 20 ms, for instance) by using linear prediction (LPC) analysis. Also, the frame is further divided into a plurality of sub-frames (of 5 ms, for instance), and parameters (i.e., delay parameter corresponding to pitch period and gain parameter) are extracted for each sub-frame on the basis of the past excitation signals. Then, pitch prediction of a pertinent sub-frame speech signal is executed by using an adaptive codebook. For an error signal which is obtained as a result of the pitch prediction, an optimum excitation codevector is selected from an excitation codebook (or vector quantization codebook) constituted by a predetermined kind of noise signal, whereby an optimal gain is calculated for excitation signal quantization. The optimal excitation codevector is selected so as to minimize the error power between a signal synthesized from the selected noise signal and the error signal noted above. Index and gain, representing the kind of the selected codevector, are transmitted together with the spectral parameter and adaptive codebook parameter to a multiplexer. Description of the receiving side is omitted.
In the above prior art speech coder, enormous computational effort is required for the selection of the optimal excitation codevector from the excitation codebook. This is so because in the method according to References 1 and 2 described above, the excitation codevector selection is executed by repeatedly performing, for each codevector, filtering or convolution a number of times corresponding to the number of the codevectors stored in the codebook. For example, where the bit number of the codebook is B and the dimension number is N, denoting the filter or impulse response length in the filtering or convolution by K, a computational effort of Nxc3x97Kxc3x972Bxc3x978,000/N per second is required. By way of example, assuming B=10, N=40 and K=10, it is necessary to execute the computation 81,920,000 times per second. The computational effort is thus enormous and economically unfeasible.
Heretofore, various methods of reducing the computational effort necessary for the excitation codebook retrieval have been proposed. For example, an ACELP (Algebraic Code-Excited Linear Prediction) system has been proposed. The system is specifically treated in C. Laflamme et al, xe2x80x9c16 kbps Wideband Speech Coding Technique based on Algebraic CELPxe2x80x9d, IEEE Proc. ICASSP-91, 1991, pp. 13-16 (Reference 3). According to Reference 3, the excitation signal is expressed with a plurality of pulses, and transmitted with the position of each pulse represented with a predetermined number of bits. The amplitude of each pulse is limited to +1.0 or xe2x88x921.0, and it is thus possible to greatly reduce the computational effort of the pulse retrieval.
The method according to Reference 3, however, has a problem that the speech quality is insufficient, although great reduction of computational effort is attainable. The problem stems from the fact that each pulse can take only either positive or negative polarity and that its absolute amplitude is always 1.0 irrespective of its position. This results in very coarse amplitude quantization, thus deteriorating the speech quality.
An object of the present invention is to provide a speech coder capable of preventing speech quality deterioration with relatively less computational effort where the bit rate is low.
According to the present invention, there is provided a speech coder comprising a spectral parameter calculation unit for obtaining a spectral parameter (i.e. spectral energy distribution) from an input speech signal and quantizing the obtained spectral parameter, an excitation quantization unit for quantizing an excitation signal of the speech signal by using the spectral parameter and outputting the quantized excitation signal, the excitation being constituted by a plurality of non-zero pulses. The speech coder further comprises a codebook for simultaneously quantizing one of two, i.e., amplitude and position, parameters of the non-zero pulses, the excitation quantization unit having a function of quantizing the non-zero pulses by obtaining the other parameter by retrieval of the codebook.
The excitation quantization unit has at least one specific pulse position for taking a pulse thereat.
The excitation quantization unit preliminarily selects a plurality of codevectors from the codebook and executes the quantization by obtaining the other parameter by retrieval of the preliminarily selected codevectors.
According to another embodiment of the present invention, there is provided a speech coder comprising a spectral parameter calculation unit for obtaining a spectral parameter from an input speech signal for every frame and quantizing the obtained spectral parameter, and an excitation quantization unit for quantizing an excitation signal of the speech signal by using the spectral parameter and outputting the quantized excitation signal. The excitation signal is constituted by a plurality of non-zero pulses. The speech coder further comprises a codebook for simultaneously quantizing the amplitude of the non-zero pulses and a mode judgment circuit for executing mode judgment by extracting a feature quantity from the speech signal. The excitation quantization unit provides, when a predetermined mode is determined as a result of the mode judgment in the mode judgment circuit, functions of a codevector and calculating positions of non-zero pulses for a plurality of sets, executing retrieval of the codebook with respect to the pulse positions in the plurality of sets and executing excitation signal quantization by selecting a combination of a codevector and pulse position, at which a predetermined equation has a maximum or a minimum value.
According to another embodiment of the present invention, there is provided a speech coder comprising a spectral parameter calculation unit for obtaining a spectral parameter from an input speech signal for every frame and quantizing the obtained spectral parameter, and an excitation quantization unit for quantizing an excitation signal of the speech signal by using the spectral parameter and outputting the quantized excitation signal. The excitation signal is constituted by a plurality of non-zero pulses. The speech coder further comprises a codebook for simultaneously quantizing the amplitude of the non-zero pulses and a mode judgment circuit for making a mode judgment by extracting a feature quantity from the speech signal. The excitation quantization unit provides, when a predetermined mode is recognized the excitation quantization unit, functions to calculate positions of non-zero pulses for at least one set, executing retrieval of the codebook with respect to pulse positions of a set having a pulse position, at which a predetermined equation has a maximum or a minimum value, and effects excitation signal quantization by selecting the optimal combination of satisfactory pulse position set and codevector. When a different predetermined mode is recognized, then the excitation quantization unit functions to represent the excitation in the form of linear coupling of a plurality of pulses and excitation codevectors selected from the excitation codebook, and executes excitation signal quantization by making retrieval of the pulses and the excitation codevectors.
According to a further embodiment of the present invention, there is provided a speech coder comprising a frame divider for dividing input speech signal into frames having a predetermined time length, a sub-frame divider for dividing each frame speech signal into sub-frames having a time length shorter than the frame, a spectral parameter calculator which receives a series of frame speech signals outputted from the frame divider, truncates the speech signal by using a window longer than the sub-frame time and does spectral parameter calculation up to a predetermined degree. The speech coder further comprises a spectral parameter quantizer which vector quantizes a LSP parameter of a predetermined sub-frame, calculated in the spectral parameter calculator, by using a linear spectrum pair parameter codebook, a perceptual weight multiplier which receive line prediction coefficients of a plurality of sub-frames, calculated in the spectral parameter calculator, and does perceptual weight multiplication of each sub-frame speech signal to output a perceptual weight multiplied signal. The speech coder also includes a response signal calculator which receives, for each sub-frame, linear prediction coefficients of a plurality of sub-frames calculated in the spectral parameter calculator and linear prediction coefficients restored in the spectral parameter quantizer, calculates a response signal for one sub-frame and outputs the calculated response signal to a subtractor. The speech coder further includes an impulse/response calculator which receives the restored linear prediction coefficients from the spectral parameter quantizer and calculates an impulse response of a perceptual weight multiply filter for a predetermined number of points. An adaptive codebook circuit receives past excitation signals fed back from the output side, the output signal of a subtractor and perceptual weight multiplier filter impulse response, obtains a delay corresponding to the pitch and outputs an index representing the obtained delay. An excitation quantizer calculates and quantizes one of the parameters of a plurality of non-zero pulses constituting an excitation by using an amplitude codebook for collectively quantizing other parameter, i.e., amplitude parameter, of excitation pulses. A gain quantizer reads out gain codevectors from a gain codebook, selects a gain codevector from amplitude codevector/pulse position data and outputs index representing the selected gain codevector to a multiplexer. A weight signal calculator receives the output of the gain quantizer, reads out a codevector corresponding to the index and obtains a drive excitation signal.
Other objects and features will be clarified from the following description with reference to attached drawings.