1. Field of the Invention
The present invention relates to a method of processing a speech signal in a speech processing system, and more particularly to a method for searching a pitch period of speech signals by using an autocorrelation of CELP (code excited linear prediction) voice corder which is embodied in a speech processing system, so as to reduce the pitch period searching time.
2. Description of the Prior Art
In a digital, portable communication system, to utilize a bandwidth of a transmission channel efficiently and to obtain a high tonal quality, several vocoder (voice coder) theories are applied. Such vocoder implementation requires a large amount of computation, and particularly a pitch searching that takes more than about 50% of the overall computation necessary for a usual vocoder implementation.
Vocoder techniques can be broadly classified into the following three types: a waveform coding method; a source coding method; and a hybrid coding method. In consideration of the quality of a synthesized speech and a recent coding technique, the hybrid coding method is regarded as the most desirable.
The hybrid coding method has the memory efficiency of source coding and the naturalness and intelligibility of waveform coding. In the hybrid method, the formant information is coded generally by the linear predictive coding (LPC) method. Depending on the hybrid coding method of the residual signal of the LPC analysis, they can be classified as RELP (residual excited linear prediction), VELP (voice excited linear prediction), CELP (code excited linear prediction) and the like. Among these methods, the CELP is the most popular and has been adopted for mobile communications.
In a vocoder using the CELP method, several parameters are extracted from an input speech signal and used to analyze the speech signal.
In the CELP vocoder the manner of analysis and synthesis is used as the method for calculating codebook parameters and coefficients of pitch filter. This results in making many computations because the approach is to set the combination of possible values for the various parameters and then select that combination of parameter values that produces a synthesized speech that is most similar to the original speech. Therefore, an improvement in the computation of the pitch filter coefficients is needed to improve the operation of a CELP vocoder.
In the speech signal, if an interval of a pitch synthesis is increased to a specific range and beyond the quality of the synthesized speech is rapidly lowered. For this reason, the interval of pitch synthesis must be kept in the range of approximately 5 to 10 ms to minimize the amount of computation and prevent the quality of the synthesized speech from being degraded.
Additionally, in a speech signal sampled in 8 KHz, a closed loop structure excellent for speech quality is used to obtain pitch lag [L] and pitch gain [b] as parameters of a pitch filter. In this closed loop structure, however, the pitch lag [L] is limited in the range of from 20 to 147. Respective synthesized speech is produced with respect to 128 pitch lag values, and then a square error of the difference between the synthesized speech and the original speech is obtained. Then, values of the pitch lag and pitch gain which generate the least error value are selected as the pitch parameters.
Generally, a CELP vocoder is broadly divided into two portions, an encoding portion and a decoding portion. A speech signal is sampled at a rate of 8000 samples/sec to produce a sampled signal as an input signal to the CELP vocoder. The sample signal to the vocoder is processed in groups of 160 samples, each group corresponding to a 20 ms frame.
In a CELP vocoder, ten LPC (linear predictive coding) coefficients, indicating formant components of the speech signal, can be obtained from the sampled signal of one frame and converted into an LSP frequency. Then, pitch searching and codebook searching are performed so as to obtain optimal pitch and codebook parameters. The pitch searching is performed once with respect to a speech signal of 5 ms so as to prevent the quality of the synthesized signal from being lowered. Therefore, the pitch searching is repeated four times per 20 ms frame.
Also, in the pitch searching process, the synthesized speech signals are compared with the original speech signal to produce optimal pitch lag and pitch gain, as described above.
FIG. 3 shows the procedure of pitch searching as a prior art speech signal processing method.
In FIG. 3, a reference signal s(n) represents an input speech signal, and is subtracted by a ZIR (zero input response) of a formant synthesizing filter 1/A(z) obtained from step 202. Suppose that the resultant value is e(n) and a signal which passes through a perceptual weighting filter W(z) is X(n). In step 204, the value e(n) is given by the equation, EQU e(n)=s(n)-a.sub.zir (n). (1)
Also, the weighting and format filters are respectively expressed in equations (2) and (3) as follows: ##EQU1## where .alpha. is the weighting factor (usually equal to 0.8); and
a.sub.i is an LPC coefficient. PA1 s(k) indicates a valley of the residual signal; PA1 n=0 indicates vertex of the peak; and PA1 k=0 indicates vertex of the valley.
On the other hand, a residual component of the input speech signal in the present frame and an output of a pitch filter in the prior frame pass through a synthesis filter H(z) in step 206, and thereby a synthesized speech signal Y.sub.L (n) can be obtained in step 210. The synthesis filter H(z) is expressed as follows: ##EQU2## where .alpha.=0.8.
Also, the synthesized speech signal y.sub.L (n) is obtained by the convolution of h(n) and P.sub.L (n) in step 210, and can be expressed by the following equation: ##EQU3## where 20&lt;L&lt;147, 0.ltoreq.n&lt;L.sub.p ; and where h(n) is an impulse response of H(z).
From the synthesized speech signal y.sub.L (n) and the original speech signal x(n) obtained thus, a square error of the difference between them can be given by the following equation: ##EQU4## where b is a pitch gain.
The process of finding the minimum value of the above expression is equivalent to the minimum value of the search procedure of the following expression: ##EQU5##
As shown in FIG. 3, a lot of computation is required for searching only one pitch parameter since the repetitive computation (from step 210 to step 216) is performed 128 times in the closed loop in order to obtain the values satisfying optimal pitch gain and pitch lag.