This invention relates to a low bit-rate pattern encoding method and a device therefor. The low bit-rate pattern encoding method or technique is for encoding an original pattern signal into an output code sequence of an information transmission rate of less than about 16 kbit/sec. The pattern signal may either be a speech or voice signal. The output code sequence is either for transmission through a transmission channel or for storage in a storing medium.
This invention relates also to a method of decoding the output code sequence into a reproduced pattern signal, namely, into a reproduction of the original pattern signal, and to a decoder for use in carrying out the decoding method. The output code sequence is supplied to the decoder as an input code sequence and is decoded into the reproduced pattern signal by synthesis. The pattern encoding is useful in, among others, speech synthesis.
Speech encoding based on a multi-pulse excitation method is proposed as a low bit-rate speech encoding method in an article which is contributed by Bishnu S. Atal et al of Bell Laboratories to Proc. IASSP, 1982, pages 614-617, under the title of "A New Model of LPC Excitation for Producing Natural-sounding Speech at Low Bit Rates." According to the Atal et al article, a discrete speech signal, namely, a digital signal sequence is divided into a succession of segments each of which has a spectral interval, such as a frame. Each segment is converted into a sequence or train of excitation or exciting pulses by the use of a linear predictive coding (LPC) synthesizer. Instants or locations of the excitation pulses and amplitudes thereof are determined by the so-called analysis-by-synthesis (A-b-S) method. In this method, a spectral parameter should be calculated for every segment to specify a short-time envelope of the speech signal and to control the LPC synthesizer. It is believed that the model of Atal et al is prosperous as a model of encoding at a bit rate between about 8 and 16 kbit/sec the discrete speech signal sequence which is derived from an original speech signal. The model, however, requires a great amount of calculation in determining the pulse instants and the pulse amplitudes. A great deal of calculation is also required in decoding the excitation pulses into the digital signal sequence. For simplicity of description, the above-mentioned encoding and decoding will collectively be called conversion hereinafter.
In the meanwhile, a "voice coding system" is disclosed in U.S. Pat. No. 4,716,592 by Kazunori Ozawa et al, the instant applicants, and assigned to the present assignee ("the Ozawi et al patent"). The voice or speech encoding system of the Ozawa et al patent application is for encoding a discrete speech signal sequence of the type described into an output code sequence, which is for use in a decoder in exciting either a synthesizing filter or its equivalent of the type of the LPC synthesizer in producing a reproduction of the original speech signal as a reproduced speech signal.
More specifically, the speech encoding system of the Ozawa et al patent application comprises a parameter calculator responsive to each segment of the discrete speech signal sequence for calculating a parameter sequence representative of a spectral envelope of the segment. Responsive to the parameter sequence, an impulse response calculator calculates an impulse response sequence which the synthesizing filter has for the segment. In other words, the impulse response calculator calculates an impulse response sequence related to the parameter sequence. An autocorrelator or covariance calculator calculates an autocorrelation or covariance function of the impulse response sequence. Responsive to the segment and the impulse response sequence, a cross-correlator calculates a cross-correlation function between the segment and the impulse response sequence. Responsive to the autocorrelation and the cross-correlation functions, an excitation pulse sequence producing circuit produces a sequence of excitation pulses by successively determining instants and amplitudes of the excitation pulses. A first coder codes the parameter sequence into a parameter code sequence. A second coder codes the excitation pulse sequence into an excitation pulse code sequence. A multiplexer multiplexes or combines the parameter code sequence and the excitation pulse code sequence into the output code sequence.
With the system according to the Ozawa et al patent, instants of the respective excitation pulses and amplitudes thereof are determined or calculated with a drastically reduced amount of calculation. It is to be noted in this connection that the pulse instants and the pulse amplitudes are calculated assuming that the pulse amplitudes are dependent solely on the respective pulse instants. The assumption is, however, not applicable in general to actual original speech signals, from each of which the discrete speech signal sequence is derived.
It is well known that a female voice has a high pitch as compared with a male voice. This means that a greater number of pitch pulses appear in the female voice than in the male voice within each segment. Inasmuch as the excitation pulses are determined in relation to the pitch pulses, a high-pitch voice is encoded into the excitation pulses greater in number than a low-pitch voice. Therefore, the high-pitch voice can not faithfully be encoded in comparison with the low-pitch voice when the excitation pulses are transmitted at the low bit rate. Anyway, the original speech signal is specified not only by a short-time spectral envelope but also pitches.