There are two general categories of pitch detection algorithms. Time domain algorithms rely on the periodic shape of speech waveforms over time and use different measures of periodicity such as the autocorrelation function or Average Magnitude Difference Function (AMDF) to evaluate the periodicity of the waveform. These methods are often computationally expensive and are also prone to insertion errors when dealing with correlated types of noise, as they cannot discriminate between tonal periodicity of a correlated noise and the rich harmonically structured periodicity of speech. Frequency domain methods, however, are based on direct evaluation of the existence of speech harmonic frequency structure, using one of the many available spectral representation techniques, such as short term Fourier transform, wavelet transform, Cepstrum and others. The success of the frequency domain methods depend on their ability to resolve frequency components of the speech, especially in the presence of noise. The latter usually requires a relatively large analysis window (as large as 100 msec), which is not suitable for real-time applications that require the lowest possible processing delay. Moreover, a large analysis window compromises the time resolution of the pitch estimates.
Using a time domain energy operator, called the Teager Energy Operator (TEO) with Pseudo Weigner Vile Transformation (PWVT) is very helpful to recover speech from noise and to recover low-frequency information of the speech signal in detecting human pitch from noisy speech recordings where the noise contains strong and stable low-frequency activity or when the recording conditions have caused a loss in low-frequency content of the speech signal. Such a PWVT-TEO algorithm relies on the information contained in the frequency range below 1 kHz. The algorithm performs well for most of the noise cases, but it may fail in dealing with the special case of noises that corrupt the frequency range that the algorithm relies on, even if the speech information is preserved neatly in the higher frequency ranges.