Pitch detection can be used for different applications like voice modification, text-to-speech transformation, speech coding, music information retrieval, musical performance systems, biometric measurements, astrophysical measurements etc. For a pitch detection time domain and frequency domain approaches are well known. The time domain approaches can be implemented cheap and easily e.g. by measuring the zero-crossing rate as described by C. H. Chen, Signal Processing Handbook, New York: Dekker, p. 531, 1988 or by a variation of autocorrelation by exploiting the similarity of successive pitch periods as described by R. Bracewell, The Autocorrelation Function, in The Fourier Transform and Its Applications, New York: MacGraw-Hill, pp. 40-45, 1965. The frequency-domain approaches are usually more complex and include the steps of a Fast Fourier Transformation (FFT) to transform the time-domain signal to a frequency-domain signal, removing of the influence of the phase by only considering the power of the frequency components, compressing the values to reduce the influence of spectral envelope, producing pitch candidates by correlation of the underlying harmonics like subharmonic summation and finding the candidate by selecting the highest peak. Such methods are known e.g. from D. J. Hermes, Measurement of pitch by subharmonic summation, in Journal of the Acoustic Society of America, 83, pp. 257-264, 1988. Another possibility to get the pitch candidates is the transformation of the frequency-domain signal back to the time-domain by Inverse Fourier Transformation (IFFT). E.g. the pitch detection algorithm as known from B. E. Bongart et al., The Frequency Analysis of Time Series for Echos: Cepstrum, Pseudoautocovariants, Cross-Cepstrum and Saphe Cracking, in Proceedings of the Symposium on Time Series Analysis, Chapter 15 pp. 209-243, New York: Wiley, 1963 is based upon spectral analysis and uses a log function for compression. If the magnitude is used as a compression operation, the resulting backward transformation is a zero-phase signal. Autocorrelation can be used in this respect, if no compression to the power spectrum is applied.
A strong compression like a log function amplifies the influence of noise and forms wrong pitch candidates. A small compression like the magnitude operation is too low to suppress the influence of spectral envelopes and, therefore, producing wrong candidates from higher harmonics. A compromise is applying a square-root operation on magnitude values as used in a harmony speech coder which is known from R. Taori et al., Harmony-1: A Versatile Low Bit Rate Speech Coding System, Nat. Lab. Technical Note 157/97. The pitch detection methods are provided to determine the right candidate out of multiple candidates, however, if the candidates are close to each other, a wrong candidate may be chosen. Further, if higher and/or lower octaves of a pitch are strongly represented, false candidates may be selected by the pitch detection methods known from the prior art.