1. Field of the Invention
This invention generally relates to digital voice transmission systems and, more particularly, to a low complexity method for improving performance of autocorrelation-based pitch detectors for digital voice transmission systems.
2. Description of the Prior Art
Code Excited Linear Prediction (CELP) and Multi-pulse Linear Predictive Coding (MPLPC) are two of the most promising techniques for low rate speech coding. The current Department of Defense (DoD) standard vocoder is the LPC-10 which employs linear predictive coding (LPC). A description of the standard LPC vocoder is provided by J. D. Markel and A. H. Gray in "A Linear Prediction Vocoder Simulation Based upon the Autocorrelation Method", IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-22, No. 2, April 1974, pp. 124-134. While CELP holds the most promise for high quality, its computational requirements can be too great for some systems. MPLPC can be implemented with much less complexity, but it is generally considered to provide lower quality than CELP.
An early CELP speech coder was first described by M. R. Schroeder and B. S. Atal in "Stochastic Coding of Speech Signals at Very Low Bit Rates", Proc. of 1984 IEEE Int. Conf. on Communications, May 1984, pp. 1610-1613, although a better description can be found in M. R. Schroeder and B. S. Atal, "Code-Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates", Proc. of 1985 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, March 1985, pp. 937-940. The basic technique comprises searching a codebook of randomly distributed excitation vectors for that vector that produces an output sequence (when filtered through pitch and linear predictive coding (LPC) short-term synthesis filters) that is closest to the input sequence. To accomplish this task, all of the candidate excitation vectors in the codebook must be filtered with both the pitch and LPC synthesis filters to produce a candidate output sequence that can then be compared to the input sequence. This makes CELP a very computationally-intensive algorithm, with typical codebooks consisting of 1024 entries, each 40 samples long. In addition, a perceptual error weighting filter is usually employed, which adds to the computational load. A block diagram of an implementation of the CELP algorithm is shown in FIG. 1, and FIG. 2 shows some example waveforms illustrating operation of the CELP method. These figures are described below to better illustrate the CELP system.
Multi-pulse coding was first described by B. S. Atal and J. R. Remde in "A New Model of LPC Excitation for Producing Natural Sounding Speech at Low Bit Rates", Proc. of 1982 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, May 1982, pp. 614-617. It was described as improving on the rather synthetic quality of the speech produced by the standard DOD LPC-10 vocoder. The basic method is to employ the LPC speech synthesis filter of the standard vocoder, but to excite the filter with multiple pulses per pitch period, instead of the single pulse used in the DoD standard system. The basic multi-pulse technique is illustrated in FIG. 3, and FIG. 4 shows some example waveforms illustrating the operation of the MPLPC method. These figures are described below to better illustrate the MPLPC system.
Currently, and in the past few years, much attention in speech coding research has been focused on achieving high quality speech at rates down to 4.8 Kbit/sec. The CELP algorithm has probably been the most favored algorithm; however, the CELP algorithm is very complex in terms of computational requirements and would be too expensive to implement in a commercial product any time in the near future. The LPC-10 vocoder is the government standard for speech coding at 2.4 Kbit/sec. This algorithm is relatively simple, but speech quality is only fair, and it does not adapt well to 4.8 Kbit/sec use. There was a need, therefore, for a speech coder which performs significantly better than the LPC-10, and for other, significantly less complex alternatives to CELP, at 4.8 Kbit/sec, rates. This need was met by the linear predictive codeword excited speech synthesizer (LPCES) described and claimed in the aforementioned copending application Ser. No. 07/612,056.
The LPCES vocoder is a close relative of the standard LPC-10 vocoder. The principal difference between the LPC-10 and LPCES vocoders lies in the synthesizer excitation used for voiced speech. The LPCES employs a stored "residual" waveform that is selected from a codebook and used to excite the synthesis filter, instead of the single impulse used in the LPC-10.
In the LPCES vocoder, the voiced excitation codeword exciting the synthesis filter is updated once every frame in synchronism with the output pitch period. This makes determination of the pitch period very important for proper operation of this coder. During development of the LPCES, artifacts in the synthesized speech were traced to errors by the pitch detector. The most bothersome artifacts were found to result from the pitch detector reporting a period that is twice or three times as long as it should be. In general, in pitch-synchronous LPC vocoders, quality of the synthesized speech is highly correlated with accuracy of pitch detection.
Many pitch detection algorithms have been described in the literature, but none have provided 100% accuracy. The problem, like many in speech coding, is a difficult one that does not have a closed-form mathematical solution. Many algorithms which are intended to deliver highly reliable pitch information introduce a level of complexity which it is desirable to avoid. Discussions of recently developed algorithms for pitch detection can be found in J. Picone et al., "Robust Pitch Detection in a Noise Telephone Environment", IEEE Proc. of 1987 Int. Conf. on Acoustics, Speech and Signal Processing, pp. 1442-1445, and H. Fujisaki et al., "A New System for Reliable Pitch Extraction of Speech", IEEE Proc. of 1987 Int. Conf. on Acoustics. Speech and Signal Processing, pp. 2422-2424.