The present invention relates to a method for multi-pulse coding a speech signal and an apparatus for performing the encoding.
A multi-pulse speech coding method (hereinafter referred to simply as a "multi-pulse method") is available for coding a speech signal at a bit rate which is lower than 16 kilobits per second. This system offers high product speech quality, as proposed by Atal et al of Bell Telephone Laboratories of the Unitd States, in "A NEW MODEL OF LPC EXCITATION FOR PRODUCING NATURAL-SOUNDING SPEECH AT LOW BIT RATES," Proc. ICASSP, pp. 614.varies.617, 1982. Specifically, a multi-pulse method is such that a synthetic filter is excited by an excitation pulse sequence which is constituted by a plurality of pulses that are different in amplitude and location from each other, thereby synthesizing a speech. The principle of multi-pulse coding will be described with reference to FIG. 1.
In FIG. 1, an excitation generator 101 generates multi-pulse excitation v(n). A synthetic filter 102 is excited by the multi-pulse excitation v(n) to produce a synthetic speech x(n). To perceptually correct an error e(n) between the original and the synthetic speeches x(n) and x(n), respectively, the error e(n) is fed through a weighting filter 103. Then, the output of the weighting filter 103, i.e., weighted error signal e.sub.w (n), is fed back to the excitation generator 101 to minimize the power of the signal e.sub.w (n). This provides optimum multi-pulse excitation v(n).
In the multi-pulse method outlined above, the result of an excitation pulse search determined the characteristic of the entire system. Atal et al propose an A-b-S (Analysis-by-Synthesis) procedure as a pulse search method, in the previously mentioned paper. However, a problem with the A-b-S procedure is that, because an excitation pulse train is determined one pulse at a time so as to minimize the error power between an original and a synthetic speech signal as stated earlier, the procedure requires a calculation of an amount which is too great to be implemented with a signal processor.
To reduce the amount of calculations, Ozawa et al has proposed a method which performs a pulse search in a correlation domain ("MULTI-PULSE EXCITED SPEECH CODER BASED ON MAXIMUM CROSSCORRELATION SEARCH ALGORITHM," IEEE Global Telecommunications Conference, 23.3, December 1983). The proposed method implements pulse search with a signal processor, as described hereinafter.
Assuming that the frame whose pulse sequence is to be determined has a length of N samples, and that K pulses are to be determined, the excitation signal v(n) may be expressed as: ##EQU1## where g.sub.i is the amplitude of the i-th pulse, m.sub.i is the location of the i-th pulse, and .delta.(n) is .delta. of Kronecker.
The synthetic speech x(n) is produced by exciting a synthetic filter by the excitation signal v(n) as represented by the Eq. (1). Therefor, it may be expressed as: ##EQU2## where h(n) is representative of the impulse response of the synthetic filter.
The weighted error e.sub.w (n), obtained by perceputally weighting the error between the original and synthetic speeches, is produced by: ##EQU3## where w(n) is the perceptual weighting function, and * stands for convulution integration.
As regards the weighted error power E, because it is obtainable by integrating the weighted error e.sub.w (n), it may be expressed as: ##EQU4##
Because an excitation pulse sequence is determined to minimize the weighted error power E, the location m.sub.k and the amplutide g.sub.k of the k-th pulse are obtained from an equation which is produced by setting the Eq. (4-2) with respect to the k-th amplutide g.sub.k to zero. The resultant pulse location m.sub.k and pulse amplitude g.sub.k are given by the following Eq. (5): ##EQU5## Here, x.sub.w (n) is the weighted speech produced by applying perceptual weighting to the original speech x(n), h.sub.w (n) is the weighted impulse response of the synthetic filter, and L is the sample length (time) of the weighted impulse response.
They may be expressed by using the impulse response of the weighting filter as follows: EQU x.sub.w (n)=x(n) * w(n) Eq. (6-3) EQU h.sub.w (n)=h(n) * w(n) Eq. (6-4)
where .phi.hx(m) is representative of crosscorrelation between x.sub.w (n) and h.sub.w (n) and Rhh(m), autocorrelation of h.sub.w (n).
It is to be noted that crosscorrelation is a function which is representative of a correlation between two signal sequences. Autocorrelation is a function which is representative of how much a signal waveform deviated by a certain time .tau. from an original waveform resembles (correlates to) the latter.
The pulse search procedure which is based on the Eq. (5) will be described next. At the beginning, cross-correlation .phi.hx(m) and autocorrelation Rhh(m) are determined. The numerator of the Eq. (5) is selected to be the criterion function R(m.sub.k) of error.
The location of the first pulse is m.sub.1 at which the absolute value of the criterion function R(m.sub.1) is maximum. The amplutide g.sub.1 of the first pulse is obtained by substituting the pulse location m.sub.1 for the Eq. (5). Then, by substituting 2 fork of the Eq. (5), a criterion function R(m.sub.2) which is free from the influence of the first pulse is determined. Subsequently, based on the criterion function R(m.sub.2), the loction m.sub.2 and the amplitude g.sub.2 of the second pulse are determined in a manner which is the same as the manner which is used to determine the location and amplitude of the first pulse. Such a procedure is repeated a number of times which is the same as the number of pulses required to determine an excitation pulse sequence.
FIG. 2 shows a specific construction for a pulse search circuit. A speech signal x(n) is applied to a weighting filter 201 to produce a weighted speech signal x.sub.w (n). On the other hand, LPC (linear prediction coding) parameters are fed to a weighted impulse-response calculator 202 to determine a weighted impulse response h.sub.w (n). Next, the weighted speech signal x.sub.w (n) and the weighted impulse response h.sub.w (n) are routed to a crosscorrelation calculator 203 to produce their crosscorrelation .phi.hx(m). At the same time, the weighted input response h.sub.w (n) is fed to a autocorrelation calculator 204 to determine its autocorrelation Rhh(m). Finally, the crosscorrelation .phi.hx(m) and the autocorrelatin Rhh(m) are delivered to a pulse search block 205 which performs pulse search for determining pulse locations m.sub.k and pulse amplitudes g.sub.k, which define an excitation pulse sequence.
In multi-pulse speech coding, a decrease in the bit rate leads to a decrease in the number of pulses which in turn impairs the sound quality. In the light of this, Ozawa et al have proposed a method of lowering the bit rate while allowing a minimum deterioration of quality to occur. The lowering of the bit rate is aided by the pitch periodicity in a voiced section of a speech signal ("HIGH QUALITY MULTI-PHASE SPEECH CODER WITH PITCH PREDICTION," Proc. ICASSP, 33.3, April 1986).
In accordance with this method, a synthetic filter is represented by a cascaded connection of a pitch prediction filter for reproducing a pulse sequence by use of a pulse sequence which occurred one pitch period before, and of a specturm envelope synthetic filter for reproducing a speech waveform. That is, pitch information is included in the impulse response of a synthetic filter. In the previously discussed multi-pulse coding which does not use pitch information, only the spectrum envelope synthetic filter is used as the synthetic filter. The method mentioned above successfully reduces the number of excitation pulses needed to excite the pitch prediction filter. Therefore, the method also reduces the number of pulses to be transmitted, as compared to a circuit wherein a pitch prediction filter is not used.
Nevertheless, the above described method of the kind using pitch information needs a synthetic filter having impulse response length which is several times greater than in the method which does not perform pitch prediction. Thus, the pitch periodicity may be represent by the impulse reponse of the synthetic filter.
This brings about a problem because the method of pulse search with pitch prediction requires a considerably greater amount of calculation than the method without pitch prediction, i.e., the calculation of autocorrelation of impulse response of a synthetic filter and the calculation of a crosscorrelation between the impulse response of a synthetic filter and an input speech signal.
The principle of multi-pulse coding which uses pitch information will be described with reference to FIG. 3. As shown in FIG. 3, an excitation generator 301 generates multi-pulse excitation v(n). A pitch prediction filter 302 is excited by the multi-pulse excitation v(n) to output an excitation pulse sequence v'(n). The excitation pulse sequence v'(n) excites a spectrum envelope synthetic filter 303 to produce a synthetic speech x(n). The error e(n) between an original speech x(n) and the synthetic speech x(n) is applied to a weighting filter 304 which is adapted for perceptual correction. The resultant weighted error signal e.sub.w (n) is fed back to the excitation generator 301 to minimize the power of the signal e.sub.w (n), whereby an optimum multi-pulse excitation v(n) is determined.
An object of the present invention is to provide multi-pulse speech coding which uses pitch information. The inventive method and apparatus for multi-pulse speech coding enables pitch prediction to be performed with a minimum amount of calculation required for the calculation of autocorrelation of impulse response of a synthetic filter and crosscorrelation between the impulse response of a synthetc filter and an input spech signal. The invention utilizes the periodicity of pitch.