There are two general categories of pitch detection algorithms. Time domain algorithms rely on the periodic shape of speech waveforms over time and use different measures of periodicity such as the autocorrelation function or Average Magnitude Difference Function (AMDF) to evaluate the periodicity of the waveform. These methods are often computationally expensive and are also prone to insertion errors when dealing with correlated types of noise, as they cannot discriminate between tonal periodicity of a correlated noise and the rich harmonically structured periodicity of speech.
Frequency domain methods however, are based on direct evaluation of the existence of speech harmonic frequency structure, using one of the many available spectral representation techniques, such as short term Fourier transform, wavelet transform, Cepstrum and others. The success of the frequency domain methods depend on their ability to resolve frequency components of the speech, especially in the presence of noise. The latter usually requires a relatively large analysis window (as large as 100 msec), which is not suitable for real-time applications that require the lowest possible processing delay. Moreover, a large analysis window compromises the time resolution of the pitch estimates.
There are many existing prior arts for pitch detection. They are based on many different criteria, in time-domain or frequency domain, for estimating the pitch. They differ, however, on their ability to be implemented in real-time with low latency and computational cost.