This invention relates to vocoder devices for encoding and decoding speech signals for the purpose of digital signal transmission or storage, and more particularly to code-book driven vocoder devices provided with a voice source generator which are suitable to be used as component parts of on-board telephone equipment for automobiles.
A vocoder device provided with a voice source generator using a waveform model is disclosed, for example, in an article by Mats Ljungqvist and Hiroya Fujisaki: "A Method for Estimating ARMA Parameters of Speech Using a Waveform Model of the Voice Source," Journal of Institute of Electronics and Communication Engineers of Japan, Vol. 86, No. 195, SP 86-49, pp. 39-45, 1986, where AR and MA parameters are used as spectral parameters of the speech signal and a waveform model of the voice source is defined as the derivative of a glottal flow waveform.
This article uses the ARMA (auto-regressive moving-average) model of the vocal tract, according to which the speech signal s(n), the voice source waveform (glottal flow derivative) g(n), and the error e(n) are related to each other by means of AR parameters a.sub.i and MA parameters b.sub.j : ##EQU1##
The model waveform of the voice source g(n) (glottal flow derivative) is shown in FIG. 9, where A is the slope at glottal opening; B is the slope prior to closure; C is the slope following closure; D is the glottal closure timing; W (=R+F) is the pulse width; and T is the fundamental period (pitch period). The voice source waveform g(n) is expressed using these voice source parameters as follows: ##EQU2## where n represents the time and .alpha. and .beta. are: EQU .alpha.=(4AR-6FB)/(F.sup.2 -2R.sup.2) EQU .beta.=CD/{D-3(T-W)}
FIG. 8a is a block diagram showing the structure of a speech analyzer unit of a conventional vocoder which operates in accordance with the method disclosed in the above article. A voice source generator 12 generates voice source waveforms 13 corresponding to the glottal flow derivative g(n), the first instance of which is selected arbitrarily. The instances of the voice source waveforms 13 are successively modified with a small perturbation as described below. In response to the input speech signal 1 corresponding to s(n) and the voice source waveforms 13 corresponding to g(n), an ARMA analyzer 44 determines the AR parameters 45 and MA parameters 46 corresponding to the a.sub.i 's and b.sub.j 's, respectively. Further, in response to the voice source waveforms 13, the AR parameters 45 and the MA parameters 46, a speech synthesizer 19 produces a synthesized speech waveforms 20. Then a distance evaluator 47 evaluates the distance E1 between the input speech signal 1 and the synthesized speech waveforms 20 by calculating the squared error: ##EQU3##
When the distance E1 is greater than a predetermined threshold value E0, one of the voice source parameters is given a small perturbation and the voice source parameters 48 are fed back to the voice source generator 12. In response thereto, the voice source generator 12 generates a new instance of the voice source waveform 13 in accordance with the perturbed voice source parameters, and the ARMA analyzer 44 generates new sets of AR parameters 45 and MA parameters 46 on the basis thereof, such that the speech synthesizer 19 produces a slightly modified synthesized speech waveforms 20.
The above operations are repeated, where the magnitude of perturbation given to the voice source parameters are successively reduced. When the distance or error E1 finally becomes less than the threshold level E0, the voice source parameters 48, the AR parameters 49 and the MA parameters 50 encoding the input speech signal 1 are output from the distance evaluator 47.
FIG. 8b is a block diagram showing the structure of a speech synthesizer unit of a conventional vocoder which synthesizes the speech from the voice source parameters 48, AR parameters 49 and the MA parameters 50 output from the analyzer of FIG. 8a. In response to the voice source parameters 48, a voice source generator 40 generates a voice source waveform 41. Further, a speech synthesizer 42 generates a synthesized speech 43 on the basis of the voice source waveform 41, the AR parameters 49 and the MA parameters 50.
The above conventional vocoder device, however, has the following disadvantage. For each set of voice source parameters, the spectral parameters (i.e., the AR and the MA parameters) are calculated to produce a synthesized speech waveforms 20, such that the distance or squared error E1 between the input speech signal 1 and the synthesized speech waveforms 20 is determined. The voice source parameters are perturbed and the synthesis of the speech and the determination of the error E1 between the original and the synthesized speech are repeated until the error E1 finally becomes less than a threshold level E0. Since the spectral parameters and the voice source parameters are determined successively by the method of "analysis by synthesis," the calculation is quite complex. Further, the procedure for determining the parameters may become unstable.
Furthermore, since the speech signal is processed in synchronism with the pitch period, a fixed or a low bit rate encoding of the speech signal is difficult to realize.