This invention relates to a method and an apparatus for low bit rate speech signal coding.
There is a known method for searching an excitation sequence of a speech signal at short time intervals as one effective speech signal coding at a transmission rate of 10 kbps or less, provided that an error in the signal reproduced using the sequence relative to the input signal is minimal. The A-b-S (Analysis-by-Synthesis) method (prior art 1) proposed by B. S. Atal at Bell Telephone Laboratories of the United States is worth notice, in that the excitation sequence is represented by a plurality of pulses with the amplitudes as well as phases are obtained on the coder side at short time intervals through that method. The detailed description of the method will be omitted herein as it appeared in the manuscript collection (ICASSP, 1982) on pp. 614.about.617 (reference 1); "A new model of LPC excitation for producing natural-sounding speech at low bit rates". The disadvantage of the conventional method referred to as prior art 1 is that the calculation amount would become larger since the A-b-S method has been employed to obtain the pulse sequence. On the other hand, there has been proposed another method (prior art 2) using correlation functions to obtain the pulse sequence, this method being intended to decrease the calculation amount (U.S. patent application Ser. No. 565,804 and Canadian Application No. 444,239). Excellent reproduced sound quality is available for the transmission rate of 10 kbps or less.
The conventional method using the correlation functions will briefly be described. The excitation sequence comprising k pieces of pulse sequence within a frame is represented by the following: ##EQU1## where .delta.(.multidot.)=.delta. of KRONECKER; N=frame length; and g.sub.k =pulse amplitude at location l.sub.k. If a predictive coefficient is assumed .alpha..sub.i (i=1, . . . , M, M being the order of the synthesis filter), the reproduced signal x(n) obtained by inputting d(n) to the synthesis filter can be written as: ##EQU2##
The weighted mean squared error between the input speech signal x(n) and the reproduced signal x(n) within one frame is given by: ##EQU3## where * represents convolutional integration; and w(n) weighting function. The weighting function is introduced to minimize the audio error in the reproduced speech. According to the audio masking effect, noise tends to be suppressed in a zone where the speech energy is greater The weighting function is determined based on the audiocharacteristics. As the weighting function there is proposed the Z-transform function W(z) using the real constant .gamma. and the predictive parameter .alpha..sub.i of the synthesis filter under the condition of 0.ltoreq..gamma..ltoreq.1 (see the reference 1). ##EQU4## If the Z-transform of the x(n) and x(n) are respectively defined as X(z) and X(z), the equation (3) will be represented by the following: EQU J=.vertline.X(z)W(z)-X(z)W(z).vertline..sup.2 ( 4)
With reference to the equation (2), x(z) will be: EQU X(z)=H(z)D(z) (5)
where; ##EQU5## H(z) is a Z transform of the synthethis filter, and D(z) is a Z transformed excitation sequence.
Substituting equation (5) into (4), the equation (6) is obtained.
J=.vertline.X(z)W(z)-H(z)W(z)D(z).vertline..sup.2 ( 6)
Accordingly, if the inverse Z transforms of X(z)W(z) and H(z)W(z) are written as x.sub.w (n)=x(n)*w(n) and h.sub.w (n)=h(n)*w(n), (6) will be: ##EQU6## by partially differentiating the equation (7) with g.sub.k and setting the result at 0, the following equation (8) is obtained. ##EQU7## where .psi..sub.xh (.multidot.) expresses a cross-correlation function between the x.sub.w (n) and h.sub.w (n), and .phi..sub.hh (.multidot.) an autocorrelation function of the h.sub.w (n). They are written as follow: ##EQU8##
The conventional method 2 (prior art 2) determines k-th pulse amplitude and location by assuming g.sub.k in the equation (8) as a function of only l.sub.k. In other words, l.sub.k maximizing .vertline.g.sub.k .vertline. of the equation (8) is determined as the k-th pulse location and g.sub.k at l.sub.k as the k-th pulse amplitude. In this method, the excitation pulse sequence is calculated under the condition that the pulse amplitude g.sub.k is only a function of the location l.sub.k. However, since g.sub.k is, generally, a function of l.sub.1, l.sub.2, . . . , l.sub.k, such a method is not an optimum one.
As described above, the excitation pulse sequence determined by the above-described conventional method is not applicable to the true minimization of J in the equation (7), whereby there exists a more suitable sound source pulse sequence. It is therefore necessary to obtain the amplitude and location of a more proper excitation pulse sequence.
The present inventor consequently has proposed a method (prior art 3) (U.S. patent application Ser. No. 626,949 and Canadian Application No. 458,282) for obtaining optimum pulse location and amplitude minimizing J.sub.w using data on the (first.about.(k-1)th) pulse locations and amplitudes when the k-th pulse location and amplitude are obtained. However, the calculation for obtaining the k-th pulse location and amplitude through the above-described method is tantamount to solving k.times.k symmetrical matrix and this would increase the calculation amount.