The distance between two subsequent amplitude peaks corresponds to the fundamental frequency of the speech signal.
Estimating a fundamental frequency is an important issue of many applications relating to speech signal processing, for instance, for automatic speech recognition or speech synthesis. The fundamental frequency may be estimated, for example, for an impaired speech signal. Based on the fundamental frequency estimate, an undisturbed speech signal may be synthesized. In another example, the fundamental frequency estimate may be used to improve the recognition accuracy of a system for automatic speech recognition.
Several methods for estimating the fundamental frequency of a speech signal are known. One method, for example, is based on an harmonic product spectrum (see, e.g., M. R. Schroeder, “Period Histogram and Product Spectrum: New methods for fundamental frequency measurements”, in Journal of the Acoustical Society of America, vol. 43, no. 4, 1968, pages 829 to 834).
Another class of methods is based on an analysis of the auto-correlation function of the speech signal (e.g. A. de Cheveigne, H. Kawahara, “Yin, a Fundamental Frequency Estimator for Speech and Music”, JASA, 2002, 111(4), pages 1917-1930). The auto-correlation function has a maximum at a lag associated with the fundamental frequency.
Methods based on the auto-correlation function, however, often encounter problems estimating low fundamental frequencies, as they can occur for male speakers. Methods to overcome this problem are hitherto either computationally inefficient or introduce a significant delay.