This invention relates to a low bit-rate pattern encoding method and a device therefor. The low bit-rate pattern encoding method or technique is for encoding an original pattern signal into an output code sequence of an information transmission rate of less than about 8 kbit/sec. The pattern signal may either be a speech or voice signal. The output code sequence is either for transmission through a transmission channel or for storage in a storing medium.
This invention relates also to a method of decoding the output code sequence into a reproduced pattern signal, namely, into a reproduction of the original pattern signal, and to a decoder for use in carrying out the decoding method. The output code sequence is supplied to the decoder as an input code sequence and is decoded into the reproduced pattern signal by synthesis. The pattern encoding is useful in, among others, speech synthesis.
Speech encoding based on a multi-pulse excitation method is proposed as a low bit-rate speech encoding method in an article which is contributed by Bishnu S. Atal et al of Bell Laboratories to Proc. IASSP, 1982, pages 614-617, under the title of "A New Model of LPC Excitation for Producing Natural-sounding Speech at Low Bit Rates." According to the Atal et al article, a discrete speech signal, namely, a digital signal sequence is derived from an original speech signal and divided into a succession of segments each of which lasts a special interval, such as a frame. Each segment is converted into a sequence or train of excitation or exciting pulses by the use of a linear predictive coding (LPC) synthesizer. Instants or locations of the excitation pulses and amplitudes thereof are determined by the so-called analysis-by-synthesis (A-b-S) method. At any rate, the model requires a great amount of calculation in determining the pulse instants and the pulse amplitudes. A great deal of calculation is also required in decoding the excitation pulses into the digital signal sequence For simplicity of description, the above-mentioned encoding and decoding will collectively be called conversion hereinafter.
In the meanwhile, a "voice coding system" is disclosed in U.S. Pat. No. 4,716,592, by Kazunori Ozawa et al, the instant applicants, for assignment to the present assignee. The voice or speech encoding and decoding system of the Ozawa et al patent application comprises an encoder for encoding a discrete speech signal sequence of the type described into an output code sequence. The system further comprises a decoder for producing a reproduction of the original speech signal as a reproduced speech signal by exciting either a synthesizing filter or its equivalent of the type of the LPC synthesizer.
More specifically, the encoder disclosed in the Ozawa et al patent application comprises a parameter calculator responsive to each segment of the discrete speech signal sequence for calculating a sequence of parameter representative of a spectral envelope. Each of the parameters may be referred to as a spectral parameter and is extracted from each spectral interval. Responsive to the parameter sequence, an impulse response calculator calculates an impulse response sequence which the synthesizing filter has for the segment. In other words, the impulse response calculator calculates an impulse response sequence related to the parameter sequence. An autocorrelator or covariance calculator calculates an autocorrelation or covariance function of the impulse response sequence Responsive to the segment and the impulse response sequence, a cross-correlator calculates a cross-correlation function between the segment and the impulse response sequence Responsive to the autocorrelation and the cross-correlation functions, an excitation pulse sequence producing circuit produces a sequence of excitation pulses by successively determining instants and amplitudes of the excitation pulses. A first coder codes the parameter sequence into a parameter code sequence. A second coder codes the excitation pulse sequence into an excitation pulse code sequence. A multiplexer multiplexes or combines the parameter code sequence and the excitation pulse code sequence into the output code sequence
With the system according to the Ozawa et al patent application, instants of the respective excitation pulses and amplitudes thereof are determined or calculated with a drastically reduced amount of calculation. It is to be noted in this connection that the pulse instants and the pulse amplitudes are calculated assuming that the pulse amplitudes are dependent solely on the respective pulse instants. The assumption is, however, not applicable in general to actual original speech signals, from each of which the discrete speech signal sequence is derived.
It is well known that a female voice has a high pitch as compared with a male voice. This means that a greater number of pitch pulses appear in the female voice than in the male voice within each segment. Inasmuch as the excitation pulses are determined in relation to the pitch pulses, a high-pitch voice is encoded into the excitation pulses greater in number than a low-pitch voice. Therefore, the high-pitch voice can not faithfully be encoded in comparison with the low-pitch voice when the excitation pulses are transmitted at the low bit rate.
The instant applicants further have proposed an improved encoding and decoding system in U.S. patent application Ser. No. 751,818 filed July 5, 1985, for assignment to the present assignee. In the improved system, each spectral interval is divided into a succession of subframes with reference to the pitch pulses. A sequence of excitation pulses is produced for the respective subframes and is partially selected in consideration of signal to noise ratios which are calculated in two adjacent ones of the subframes. With this system, the excitation pulses are located in every other subframe and are not always located in the remaining subframes of each spectral interval. As a result, the excitation pulses can be reduced in number in the improved system and can be transmitted at a low transmission bit rate or information transmission rate.
However, the reduction of the excitation pulses has its limit because the excitation pulses must always be placed in every other subframe even when each subframe is not significant. This makes it difficult to transmit the excitation pulses at a transmission bit rate lower than 8 kbit/sec.
In addition, the reduction of the excitation pulses brings about an undesired or unnatural reproduction of the original pattern signal. Such an undesired reproduction becomes serious at a transition time instant between voices speech and unvoiced speech because desired excitation pulses can not be produced at the transition time instant. Thus, a speech quality is degraded at the transition time instant.