This invention relates to a low bit rate encoding system of a voice signal, and more particularly an encoding system in which the rate of the transmitted signal is made to be less than 10k bits/second.
As an effective method of encoding a voice signal at a transmission information rate of less than 10k bits/second, a method has been known in which an excitation signal of a voice signal is searched at each short interval while maintaining the error between a synthesized signal and an input signal at a minimum. Depending upon the type of the method of search, this method is called a tree coding method or a vector quantization method. In addition to these methods, a system has recently been proposed according to which a plurality of pulse series or trains representing the excitation signal series are sequentially obtained at each short interval by using an analysis-by-synthesis (A-b-S) method on the side of an encoder. The invention uses this A-b-S method and the detail thereof is described in B. S. Atal et al paper entitled "A New Model of LPC Excitation For Producing Natural-sounding Speach at Low Bit Rates" on pages 614 to 617 of advanced manuscripts published by I.C.A.S.S.P., 1982, (hereinafter called paper No. 1). The outline of this paper will be described later.
This prior art system however has a defect that the quantity to be calculated is extremely large. Because according to this system, at the time of calculating the position and amplitude of the pulse in the excitation pulse series, it is necessary to calculate the error and the error power between a signal synthesized from the pulse and an original signal to feedback the error and error power thereof for adjusting the position and amplitude of the pulse and in addition, it is necessary to repeat a series of processings until the number of pulses reaches a predetermined number.
Furthermore, according to this prior art system, since the analysis frame length is constant, degradation is caused by the discontinuity of the waveform near the boundary of the frames of the reproduced signal series when the frame is switched at a portion where the power of the input voice signal series is large, thus greatly imparing the quality of the reproduced voice.