The invention relates to a method of speech coding which is based on the ITU-T recommendation G.729 for 8-kbit/s speech coding scheme CS-ACELP (hereinafter referred to in the specification and claims as "G.729") and which allows speech coding at a lower rate.
Various efficient coding schemes are attempted in the field of digital mobile communications for an efficient utilization of radio waves. Known schemes for speech coding at information rate on the order of 8 kbit/s include CELP (code excited linear prediction), VSELP (vector sum excited linear prediction), CS-ACELP and the like.
For details of these coding schemes, refer to "Code-Excited Linear Prediction (CELP): High Quality Speech at a Very Low Rates" by M. R. Schroeder and B. S. Atal in Proc. ICASSP' 85, 25.1.1, pp 937-940, 1985 (literature 1), "Vector Sum Excited Linear Prediction (VSELP) Speech Coding at 8 kps" by I. A. Gerson and M. A. Jasiuk in Proc. ICASSP' 90, S9.3, pp 461-464, 1990 (literature 2), and "ITU-T 8 kbit/s Standard Speech Codec for Personal Communication Services" by A. Kataoka et al in Int. Conf. On Universal Personal Communication, pp 818-822, 1995 (literature 3). For details of 8 kbit/s International Standard G.729 (CS-ACELP), refer to ITU-T Recommendation: G.729 Coding of speech at 8 kbit/s using conjugate-structure algebraic code excited linear prediction (herein after referred to in the specification and claims as "CS-ACELP") COM 15-152-E, July 1995 (literature 4).
FIG. 1 shows an example of a coder used in such schemes, including an input terminal 11, an adder 12, a subtractor 13, a filter coefficient determination part 14, a filter coefficient quantizer 15, a synthesis filter 16, a perceptual weighting filter 17, a distortion power calculator 18, a code output part 19, an adaptive codebook 21, a random codebook 22, a estimated gain part 23, a gain part 24, a gain estimation part 25, a codebook search part 26, a gain codebook 27 and an LSP codebook 28.
Referring to FIG. 1, an input speech signal waveform is applied to the input terminal 11, and a given number of samples (hereafter referred to as speech waveform vectors) are extracted from the sample train of the waveform every frame of 10 ms to be fed to the filter coefficient determination part 14 where linear prediction coefficients (or LPC coefficients) are calculated. The LPC coefficients are converted into LSP coefficients in the filter coefficient quantizer 15 where they are quantized by reference to the LSP codebook 28. The quantized LSP coefficients have their quantized codes I.sub.sp delivered and are also converted back to LPC coefficients to be set up in the synthesis filter 16 as filter coefficients.
The adaptive codebook 21 stores exciting vectors over a plurality of past frames as pitch component vectors which adaptively change. A pitch component vector candidate P is chosen from the plurality of pitch component vectors, and a random component vector candidate C is chosen from a plurality of fixed random component vectors (or random number vectors) contained in the random codebook 22. Gains g.sub.P, g.sub.N chosen from the gain codebook 27 and forming a gain vector candidate g=(g.sub.P, g.sub.N) are applied to the candidates P, C in multipliers 24P, 24N, respectively, of the gain part 24, and the resulting products are added together in the adder 12 to be fed to the synthesis filter 16 as exciting vectors, thus synthesizing a speech. The gain estimation part 25 predicts from past random component vectors an approximate gain, which is then set up in the estimated gain part 23.
A synthesized speech is subtracted from the input speech waveform vector X, and a resulting error vector is perceptually weighted in the perceptual weighting filter 17 to be fed subsequently to the distortion power calculator 18. The distortion power calculator 18 calculates the power of a perceptually weighted error (or distortion), and the codebook search part 26 is effective to select respective candidate vectors from the adaptive codebook 21, the random codebook 22 and the gain codebook 27 so that the power in the distortion is minimized. Code output part 19 delivers indices I.sub.P, I.sub.N, I.sub.G, representing these selected vectors, together with code I.sub.sp which represents the quantized LSP coefficients as coded outputs.
FIG. 2 shows an example of a decoder corresponding to the coder shown in FIG. 1, including an input terminal 31, an adder 32, a filter coefficient decoder 33, a synthesis filter 34, an adaptive codebook 35, a random codebook 36, a estimated gain part 37, a gain part 38, a gain estimation part 39, and a gain codebook 41. In the arrangement of FIG. 2, the received code I.sub.sp is fed to the filter coefficient decoder 33 where LSP coefficients are decoded and then converted into LPC coefficients, which are in turn fed to the synthesis filter 34 to be used as filter coefficients therein. The received code I.sub.G is decoded into gain vector (g.sub.P, g.sub.N) in the gain codebook 41 for use as gains g.sub.P, g.sub.N in the multipliers 38P, 38N of the gain part 38.
On the other hand, pitch component vector P and random component vector C are read out from the adaptive codebook 35 and the random codebook 36, respectively, in a manner corresponding to the received codes I.sub.P and I.sub.N. The pitch component vector P is multiplied by the gain g.sub.P in the gain part 38 while the random component vector C is initially multiplied by the estimated gain from the gain estimation part 39 in the estimated gain part 37 to be adaptively gain adjusted and is then multiplied by the gain g.sub.N in the gain part 38. The gain controlled pitch component vector and random component vector from the gain part 38 are synthesized in the adder 32 to be fed to the synthesis filter 34 as exciting vectors, whereby a decoded speech is delivered.
FIG. 3 shows a bit allocation for coding individual parameters used in G.729. In G.729, a frame length is equal to 10 ms, using 80 bits per frame. Of these, 18 bits are allocated to coding LSP coefficients. The coding of LSP coefficients takes place by way of a vector quantization in two stages as illustrated in FIG. 4. In the first stage vector quantization, a 10-th order vector quantization is effected using a first stage LSP codebook having 128 candidates (7 bits). In the second stage, a 10-th bit vector quantization is effected using a pair of LSP codebooks, a higher order and a lower order one, each having 32 candidates (5 bits) to enable a 5-th order vector quantization. One bit is allocated for selection of prediction coefficients.
For coding a pitch component vector using the adaptive codebook 21, the frame is divided into a first 5 ms subframe and a second 5 ms subframe. 8 bits and one parity bit are allocated to the first subframe while 5 bits are allocated to the second subframe. For coding a random component vector using the random codebook 22, 17 bits, inclusive of 4 bits for the polarities of four pulses, are allocated to each subframe.
FIG. 5 shows predetermined positions which the four pulses can assume when a random exciting pulse structure to be used in coding the random component vector with the random codebook according to G.729 is realized by using four pulses in each subframe. Specifically, positions from No. 0 to No. 39 are defined in the 40 ms subframe at a spacing of 1 ms, for example, and such 40 positions are allocated to pulses #0 to #3 as shown in the chart of FIG. 5 which conforms to G.729. As will be evident from the chart, eight positions are available for each of the pulses #0, #1 and #2 in tracks 0, 1 and 2, and thus a position can be specified by three bits. For pulse #3, sixteen positions are available in two tracks 3 and 4. Thus the position can be specified by four bits. Hence, information representing the positions of the four pulses in each subframe can be given by 13 bits. In addition to the 13 bits, the sign (polarity) of each of the four pulses is given by one bit, thus using a total of 17 bits for each entire subframe.
For coding a gain vector with the gain codebook 27, 7 bits are allocated to each subframe as indicated in FIG. 3, thus using a total of 14 bits.
It is to be noted that when performing a communication with Codec according to the ITU International Standard G.729, it is possible that a sufficient transmission capacity may not be secured depending on the condition of a transmission path, presenting a problem that the communication may be disabled. While it may be contemplated to achieve the communication by using a coding scheme which requires a less transmission capacity, this presents another problem that an entirely distinct coder and decoder combination is necessary. Accordingly, it is desirable in such instance to reduce the bit rate of the signal without a significant degradation in the speech quality while allowing a code structure similar to that of the International Standard G.729 to be retained. However, it has been unknown how it is possible to reduce the bit allocation to a particular part of the code structure effectively without accompanying a degradation in the speech quality.