This invention relates to a method and an apparatus for low-bit rate speech-band signal coding.
There is a known method for searching an excitation sequence of a speech signal at short time intervals as one effective way of speech signal coding at a transmission rate of 16 kbps or less, provided that an error between speech and the signal reproduced using the sequence relative to the input signal is minimal. Multi-pulse excitation method (Prior Art 1) proposed by B. S. Atal et al. at Bell Telephone Laboratories of the United States is worth notice, in that the excitation sequence is represented by a sequence of pulses with the amplitudes as well as phases, which are obtained on the coder side in short time intervals through A-b-S (Analysis-by-Synthesis) based pulse search method. The detailed description of the method will be omitted herein as it appeared in the manuscript collection (ICASSP, 1982) on pp. 614 to 617 (Reference 1); "A new model of LPC excitation for producing natural-sounding speech at low bit rates". The disadvantage of the conventional method referred to as Prior Art 1 is that the calculation amount would become larger since the A-b-S method has been employed to obtain the pulse sequence. On the other hand, there has been proposed another method (Prior Art 2) using correlation functions to obtain the pulse sequence, this method being intended to decrease the calculation amount (refer to U.S. patent application Ser. No. 565,804 now U.S. Pat. No. 4,716,592 and Canadian application No. 444,239 called Reference 2). Excellent reproduced sound quality is available for the transmission rate of 16 kbps or less.
The conventional method using the correlation functions will briefly be described. The excitation sequence comprising k pulses in a frame is represented by the following: ##EQU1## where: .delta. (.multidot.)=.delta. of KRONECKER; N=frame length; and g.sub.k =pulse amplitude at location m.sub.k. If a predictive coefficient is assumed to be a.sub.i (i=1, . . . , M, M being the order of a synthesis filter), the reproduced signal x(n) obtained by inputting d(n) to the synthesis filter can be written as: ##EQU2##
The weighted mean-squared error between the input speech signal x(n) and the reproduced signal x(n) calculated in one frame is given by: ##EQU3## where: * represents convolutional process; and w(n) weighting function. The weighting function is introduced to reduce perceptual distortion in the reproduced speech. According to the speech masking effect, noise in a Formantarea where the speech energy is larger tends to be effectively masked by original speech. The weighting function is determined based on short time speech characteristics. As the weighting function, there is proposed the Z-transform function W(z) using the real constant .gamma. and the predictive coefficient a.sub.i of the synthesis filter under the condition of 0.ltoreq..gamma..ltoreq.1 (see the Reference 1): ##EQU4## If the Z-transforms of the x(n) and x(n) are respectively defined as X(z) and X(z), Equation (3) will be represented by the following: EQU J=.vertline.X(z)W(z)-X(z)W(z).vertline..sup.2 ( 5)
With reference to Equation (2), X(z) will be: EQU X(z)=H(z)D(z) (6)
where: ##EQU5## H(z) is a Z-transform of the synthesis filter; and D(z) is a Z-transformed excitation sequence.
Substituting Equation (6) into Equation (5), the following Equation (7) is obtained: EQU J=.vertline.X(z)W(z)-H(z)W(z)D(z).vertline..sup.2 ( 7)
Accordingly, if the inverse Z-transforms of X(z)W(z) and H(z)W(z) are written as x.sub.w (n)=x(n) * w(n) and h.sub.w (n)=h(n) * w(n), respectively, Equation (7) will be: ##EQU6## By partially differentiating Equation (8) with g.sub.i and setting the result to 0, the following Equation (9) is obtained: ##EQU7## where: .phi..sub.xh (.multidot.) expresses a cross-correlation function between the x.sub.w (n) and h.sub.w (n); and R.sub.hh (.multidot.) covariance function of h.sub.w (n). They are written as follows: ##EQU8##
By properly processing frame edges, the covariance function R.sub.hh (m.sub.i, m.sub.j) is replaced by auto-correlation function R.sub.hh (.vertline.m.sub.i -n.sub.j).
The conventional method 2 (Prior Art 2) determines the k-th pulse amplitude and location by assuming g.sub.i in Equation (9) as a function of only m.sub.i. In other words, location m.sub.i maximizing g.sub.i of Equation (9) is obtained as the i-th pulse location and g.sub.i obtained at that time i-th pulse amplitude from Equation (9). In this method, the excitation pulse sequence minimizing J of Equation (8) can be calculated with reduced computation amount.
Since the coding mode at the transmitting side is constant, any of the conventional methods so far described has failed to code the input signals and thus has been unable to produce high quality speech band signals.