The present invention relates to a high efficiency speech coding method which employs a random codebook and is applied to Code-Excited Linear Prediction (CELP) coding or Vector Sum Excited Linear Prediction (VSELP) coding to encode a speech signal to digital codes with a small amount of information. The invention also pertains to a decoding method for such a digital code.
At present, there is proposed a high efficiency speech coding method wherein the original speech is divided into equal intervals of 5 to 50 msec periods called frames, the speech of one frame is separated into two pieces of information, one being the envelope configuration of its frequency spectrum and the other an excitation signal for driving a linear filter corresponding to the envelope configuration, and these pieces of information are encoded. A known method for coding the excitation signal is to separate the excitation signal into a periodic component considered to correspond to the fundamental frequency (or pitch period) of the speech and the other component (in other words, an aperiodic component) and encode them. Conventional excitation signal coding methods are known under the names of Code-Excited Linear Prediction (CELP) coding and Vector Sum Excited Linear Prediction (VSELP) coding methods. Their techniques are described in M. R. Schroeder and B. S. Atal: "Code-Excited Linear Prediction (CELP); High-Quality Speech at Very Low Bit Rates," Proc. ICASSP '85, 25. 1. 1, pp. 937-940, 1985, and I. A. Gerson and M. A. Jusiuk: "Vector Sum Excited Linear Prediction (VSELP) Speech Coding at 8 kbps," Proc. ICASSP '90, S9.3, pp. 461-464, 1990.
According to these coding methods, as shown in FIG. 1, the original speech X input to an input terminal 11 is provided to a speech analysis part 12, wherein a parameter representing the envelope configuration of this frequency spectrum is calculated. A linear predictive coding (LPC) method is usually employed for the analysis. The LPC parameters thus obtained are encoded by a LPC parameter encoding part 13, the encoded output A of which is decoded by LPC parameter decoding part 14, and the decoded LPC parameters a' are set as the filter coefficients of a LPC synthesis filter 15. By applying an excitation signal (an excitation vector) E to the LPC synthesis filter 15, a reconstructed speech X' is obtained.
In an adaptive codebook 16 there is always held a determined excitation vector of the immediately preceding frame. A segment of a length L corresponding to a certain period (a pitch period) is cut out from the excitation vector and the vector segment thus cut out is repeatedly concatenated until the length T of one frame is reached, by which a codevector corresponding to the periodic component of the speech is output. By changing the cut-out length L which is provided as a period code (indicated by the same reference character L as that for the cut-out length) to the adaptive codebook 16, it is possible to output a codevector corresponding to the different period. In the following description the codevector which is output from the adaptive codebook will be referred to as an adaptive codevector.
While one or a desired number of random codebooks are provided, the following description will be given of the case where two random codebooks 17.sub.1 and 17.sub.2 are provided. As indicated by reference numeral 17 in FIG. 2 as a representative of either random codebook 17.sub.1 or 17.sub.2, there are prestored in the random codebooks 17.sub.1 or 17.sub.2, independently of the input speech, various vectors usually based on a white Gaussian noise and having the length T of one frame. From the random codebooks the stored vectors specified by given random codes C (C.sub.1, C.sub.2) are read out and output as codevectors corresponding to aperiodic components of the speech. In the following description the codevectors output from the random codebooks will be referred to as random codevectors.
The codevectors from the adaptive codebook 16 and the random codebooks 17.sub.1 or 17.sub.2 are provided to a weighted accumulation part 20, wherein they are multiplied, in multiplication parts 21.sub.0, 21.sub.1 and 21.sub.2, by weights (i.e., gains) g.sub.0, g.sub.1 and g.sub.2 from a weight generation part 23, respectively, and the multiplied outputs are added together in an addition part 22. The weight generation part 23 generates the weights g.sub.0, g.sub.1 and g.sub.2 in accordance with a weight code G provided thereto. The added output from the addition part 22 is supplied as an excitation vector candidate to the LPC synthesis filter 15, from which the synthesized speech X' is output. A distortion d of the synthesized speech X', with respect to the original speech X from the input terminal 11, is calculated in a distance calculation part 18.
Based on a criterion for minimizing the distortion d, a codebook search control part 19 searches for a most suitable cut-out length L in the adaptive codebook 16 to determine an optimal codevector of the adaptive codebook 16. Then, the codebook search control part 19 determine sequentially optimal codevectors of the random codebooks 17.sub.1 and 17.sub.2 and optimal weights g.sub.0, g.sub.1 and g.sub.2 of the weighted accumulation part 20. In this way, a combination of codes is searched which minimizes the distortion d, and the excitation vector candidate at that time is determined as an excitation vector E for the current frame and is written into the adaptive codebook 16. When the distortion is minimized, the period code L representative of the cut-out length of the adaptive codebook 16, the random codes C.sub.1 and C.sub.2 representative of code vectors of the random codebooks 17.sub.1 and 17.sub.2, a weight code G representative of the weights g.sub.0, g.sub.1 and g.sub.2, and a LPC parameter code A are provided as coded outputs and transmitted or stored.
FIG. 3 shows a decoding method. The input LPC parameter code A is decoded in a LPC parameter decoding part 26 and the decoded LPC parameters a' are set as filter coefficients in a LPC synthesis filter 27. A vector segment of a period length L of the input period code L is cut out of an excitation vector of the immediately preceding frame stored in an adaptive codebook 28 and the thus cut-out vector segment is repeatedly concatenated until the frame length T is reached, whereby a codevector is produced. On the other hand, codevectors corresponding to the input random codes C.sub.1 and C.sub.2 are read out of random codebooks 29.sub.1 and 29.sub.2, respectively, and a weight generation part 32 of a weighted accumulation part 30 generates the weights g.sub.0, g.sub.1 and g.sub.2 in accordance with the input weight code G. These output code vectors are provided to multiplication parts 31.sub.0, 31.sub.1 and 31.sub.2, wherein they are multiplied by the weights g.sub.0 g.sub.1 and g.sub.2 from the weight generation part 32 and then added together in an addition part 33. The added output is supplied as a new excitation vector E to the LPC synthesis filter 27, from which a reconstructed speech X' is obtained.
The random codebooks 29.sub.1 and 29.sub.2 are identical with those 17.sub.1 and 17.sub.2 used for encoding. As referred to previously, only one or more than one random codebooks may sometimes be employed. In the CELP speech coding, codevectors to be selected as optimal codevectors are directly prestored in the random codebooks 17.sub.1, 17.sub.2 and 29.sub.1, 29.sub.2 in FIGS. 1 and 3. That is, when the number of codevectors to be selected as optimal code vectors is N, the number of vectors stored in each random codebook is also N.
In the VSELP speech coding, the random codebooks 17.sub.1 and 17.sub.2 in FIG. 1 are replaced by a random codebook 27 shown in FIG. 4, in which M vector (referred to as basis vectors in the case of VSELP coding) stored in a basis vector table 25 are simultaneously read out, they are provided to multiplication parts 34.sub.1 to 34.sub.M, wherein they are multiplied by +1 or -1 by the output of a random codebook decoder 24, and the multiplied outputs are added together in an addition part 35, thereafter being output as a codevector. Accordingly, the number of different code vectors obtainable with all combinations of the signal values +1 and -1, by which the respective basis vectors are multiplied, is 2.sup.M, one of the 2.sup.M codevectors is chosen so that the distortion d is minimized, and the code C (M bits) indicating a combination of signs which provides the chosen codevector is determined.
There are two methods for determining the weights g.sub.0, g.sub.1 and g.sub.2, which are used in the weighted accumulation part 20 in FIG. 1; a method in which weights are scalar quantized, which are theoretically optimal so that the distortion is minimized during the search for a period (i.e., the search for the optimal cut-out length L of the adaptive codebook 16) and during search for a random code vector (i.e., the search for the random codebooks 17.sub.1 and 17.sub.2), and a method in which a weight codebook is searched, which has prestored therein, as weight vectors, a plurality of sets of weights g.sub.0, g.sub.1 and g.sub.2, the weight vector (g.sub.0, g.sub.1 and g.sub.2) is determined to minimize the distortion.
With the conventional methods described above, since the periodicity of the excitation signal is limited only to the component of the preceding frame, the periodicity is not clearly expressed and hence the reconstructed speech is hoarse and lacks smoothness.