1. Field of the Invention
The present invention generally relates to digital voice transmission systems and, more particularly, to a simple method of combining stochastic excitation and pulse excitation for a low-rate multi-pulse speech coder.
2. Description of the Prior Art
Code excited linear prediction (CELP) and multi-pulse linear predictive coding (MPLPC) are two of the most promising techniques for low rate speech coding. While CELP holds the most promise for high quality, its computational requirements can be too great for some systems. MPLPC can be implemented with much less complexity, but it is generally considered to provide lower quality than CELP.
Multi-pulse coding is believed to have been first described by B. S. Atal and J. R. Remde in "A New Model of LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates", Proc. of 1982 IEEE Int. Conf. on Acoustics, Speech. and Signal Processing, May 1982, pp. 614-617, which is incorporated herein by reference. It was described to improve on the rather synthetic quality of the speech produced by the standard U.S. Department of Defense LPC-10 vocoder. The basic method is to employ the linear predictive coding (LPC) speech synthesis filter of the standard vocoder, but to use multiple pulses per pitch period for exciting the filter, instead of the single pulse used in the Department of Defense standard system. The basic multi-pulse technique is illustrated in FIG. 1.
At low transmission rates (e.g., 4800 bits/second), multi-pulse speech coders do not reproduce unvoiced speech correctly. They exhibit two perceptually annoying flaws: 1) amplitude of the unvoiced sounds is too low, making sibilant sounds difficult to understand, and 2) unvoiced sounds that are reproduced with sufficient amplitude tend to be buzzy, due to the pulsed nature of the excitation.
To see how these problems arise, the cause of the second of these two flaws is first considered. In a multi-pulse coder, as the transmission rate is lowered, fewer pulses can be coded per unit time. This makes the "excitation coverage" sparse; i.e., the second trace ("Exc Signal") in FIG. 2 contains few pulses. During voiced speech, as shown in FIG. 2, this sparseness does not become a significant problem unless the transmission rate is so low that a single pulse per pitch period cannot be transmitted. As seen in FIG. 2, the coverage is about three pulses per pitch period. At 4800 bits/second, there is usually enough rate available so that several pulses can be used per pitch period (at least for male speakers), so that coding of voiced speech may readily be accomplished. However, for unvoiced speech, the impulse response of the LPC synthesis filter is much shorter than for voiced speech, and consequently, a sparse pulse excitation signal will produce a "splotchy", semi-periodic output that is buzzy sounding.
A simple way to improve unvoiced excitation would be to add a random noise generator and a voiced/unvoiced decision algorithm, as in the standard LPC-10 algorithm. This would correct for the lack of excitation during unvoiced periods and remove the buzzy artifacts. Unfortunately, by adding the voiced/unvoiced decision and noise generator, the waveform-preserving properties of multi-pulse coding would be compromised and its intrinsic robustness would be reduced. In addition, errors introduced into the voiced/unvoiced decision during operation in noisy environments would significantly degrade the speech quality.
As an alternative, one could employ simultaneous pulse excitation and random codebook excitation similar to CELP. Such a system is described by T. V. Sreenivas in "Modeling LPC-Residue by Components for Good Quality Speech Coding", Proc. of 1988 IEEE Int. Conf. on Acoustics, Speech. and Signal Processing. April 1988, pp. 171-174, which is incorporated herein by reference. By simultaneously obtaining the pulse amplitudes and searching for the codeword index and gain, a robust system that would give good performance during both voiced and unvoiced speech could be provided. While this technique appears to be feasible at first look, it can become overly complex in implementation. If an analysis-by-synthesis codebook technique is desired for the multi-pulse positions and/or amplitudes, then the two codebooks must be searched together; i.e., if each codebook has N entries, then N.sup.2 combinations must be run through the synthesis filter and compared to the input signal. ("Codebook" as used herein refers to a collection of vectors filled with random Gaussian noise samples, and each codebook contains information as to the number of vectors therein and the lengths of the vectors.) With typical codebook sizes of 128 vector entries, the system becomes too complex for implementation of an equivalent size of (128).sup.2 or 16,384 vector entries.