1. Field of the Invention
This invention relates to methods and means for the determination of the pitch of an acoustic signals within a vocoder analyzer.
2. Description of Related Art
Relevant publications include:
1. Yang et al., "Pitch Synchronous Multi-Band (PSMB) Speech Coding," Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP'95, pp. 516-519, 1995 (describes a pitch-period-based speech coder);
2. Daniel W. Griffin and Jae S. Lim, "Multiband Excitation Vocoder," Transactions on Acoustics, Speech, and Signal Processing, Vol. 36, No. 8, August 1988, pp.1223-1235 (describes a multiband excitation model for speech where the model includes an excitation spectrum and spectral envelope);
3. John C. Hardwick and Jae S. Lim, "A 4.8 Kbps Multi-Band Excitation Speech Coder," Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP'88, pp. 374-377, New York 1988, (describes a speech coder that uses redundancies into more efficiently quantize the speech parameters);
4. Daniel W. Griffin and Jae S. Lim, "A New Pitch Detection Algorithm," Digital Signal Processing '84, Elsevier Science Publishers, 1984, pp. 395-399, (describes an approach to pitch detection in which the pitch period and spectral envelope are estimated by minimizing a least squares error criterion between the synthetic spectrum and the original spectrum);
5. Daniel W. Griffin and Jae S. Lim, "a New Model-Based Speech Analysis/Synthesis System," Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP'85, 1985, pp. 513-516 (describes the implementation of a model-based speech analysis/synthesis system where the short time spectrum of speech is modeled as an excitation spectrum and a spectral envelope);
6. Robert J. McAulay and Thomas F. Quatieri, "Mid-Rate Coding Based On A Sinusoidal Representation of Speech," Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP'85, 1985, pp. 945-948 (describes a sinusoidal model to describe the speech waveform using the amplitudes, frequencies, and phases of the component sine waves);
7. Robert J. McAulay and Thomas F. Quatieri, "Computationally Efficient Sine Wave Synthesis And Its Application to Sinusoidal Transform Coding," Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP'88, 1988, pp. 370-373, (describes a technique to synthesize speech using sinusoidal descriptions of the speech signal while relieving the computational complexity inherent in the technique);
8. Xiaoshu Qian and Randas Kumareson, "A variable Frame Pitch Estimator and Test Results," Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing ICASSP'96, 1996, pp. 228-231, (describes a new algorithm to identify voiced sections in a speech waveform and determine their pitch contours); and
9. Ma Wei, "Multiband Excitation Based Vocoders and Their Real-Time Implementation", Dissertation, University of Surrey, Guildford, Surrey, U.K. May 1994, pp. 145-150 (describes vocoder analysis and implementations).
In vocoder applications, the prior art has demonstrated complicated methods to estimate the pitch of an acoustic input signals. One method of improving pitch estimation has been to improve the resolution by using half samples, quarter samples, or even finer sampling. The finer sampling increase the complexity of the implementation of the pitch estimation significantly.
Pitch estimation in fractional sample intervals has been successful in waveform and hybrid coding schemes, since it improves the speech quality in the sense of waveform similarity. However, vocoders do not necessarily need accurate pitch since a waveform based distortion is not valid in a vocoder. The reason that high resolution pitch estimation is used within a vocoder is to remove the effects of pitch doubling. Pitch doubling is an error condition where the estimation technique selects a pitch that is twice that of the correct pitch.
U.S. Pat. No. 5,226,108 (Hardwick et al.) discloses a pitch estimation method where sub-integer resolution values are estimated in making the initial pitch estimate. An error function is minimized in the pitch selection, with a forward tracking and backward tracking method being employed to prevent the pitch doubling phenomena. The text explaining the background of the invention details the state of the prior art in the analysis and synthesis of acoustical signals. The content of U.S. Pat. No. 5,226,108 is incorporated herein by reference.
U.S. Pat. No. 5,495,555 (Swaninathan) discloses a technique for high quality low bit rate speech coding and decoding employing a codebook excited linear prediction technique.