There are two general categories of pitch detection algorithms. Time domain algorithms that rely on the periodic shape of speech waveform over time and use different measures of periodicity such as the autocorrelation function or Average Magnitude Difference Function (AMDF) to evaluate the periodicity of the waveform. These methods are often computationally expensive and are also prone to insertion errors when dealing with correlated type of noise, as they cannot discriminate between tonal periodicity of a correlated noise and the rich harmonically structured periodicity of speech.
Frequency domain methods, however, are based on direct evaluation of the existence of speech harmonic frequency structure, using one of the many available spectral representation techniques such as short term Fourier transform, wavelet transform, Cepstrum, and others. The success of the frequency domain methods depend on their ability to resolve frequency components of the speech, especially in the presence of noise. The latter usually requires a relatively large analysis window (as large as 100 msec), which is not suitable for real-time applications that require the lowest possible processing delay. Moreover, a large analysis window compromises the time resolution of the pitch estimates.
One of the technical problems in detection of human pitch from noisy speech recordings is coping with correlated types of noises, such as car engine noise, that contain strong and stable low-frequency activity. In such a case, the noise waveform has a periodic shape and thus it is difficult to distinguish them from periodic voiced segments of the speech signal. Also, another technical problem appears for speech recordings that have lost their low-frequency information for various reasons, such as imperfect recording conditions, telephony microphone filtering (a high-pass filtering effect with cut-off frequency around few hundreds of Hz), and the like.