A widely used technique in digital signal analysis is the application of the fast Fourier transform (FFT) to transform the signal from the time domain to the frequency domain. Often the signal to be transformed is windowed prior to the application of the FFT. The resulting spectrum represents the windowed signal as projected onto a basis consisting of complex sinusoids. The complex coefficients of these projections can be interpreted as the amplitude and phase of a particular stationary frequency in the original windowed signal. However, this representation as a collection of stationary signals is not an accurate model for many audio signals. In many instances, a more useful model of the audio signal would include fewer sinusoidal peaks which are not stationary. For instance, having a more accurate model of the underlying original sound sources is vital in applications such as computational auditory scene analysis, where the goal is to separate a mixed signal into individual sound sources. For such applications, having as much information as possible about how sinusoid components are continuously changing in frequency and amplitude is desirable. Obtaining more such information about an audio signal requires further processing of the spectra obtained from an FFT.
Peak tracking is one approach to estimating changes in frequency and amplitude. An example of this approach is found in J. O. Smith and X. Serra, “PARSHL: A PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation”, Proceedings of Int. Computer Music Conf., 1987, pp. 1-22. However, to track peaks accurately, it is often necessary to use a short step size, which increases the number of FFTs taken, thus increasing the computational cost. In addition, it is difficult to track peaks which cross each other.
Another approach to estimating changes in frequency and amplitude is found in A. S. Master and Y. Liu, “Robust Chirp Parameter Estimation for Hann Windowed Signals”, Proceedings of IEEE Int. Conf. on Multimedia and Exposition 2003, pp. 717-720. This approach relies on the fact that FFT bins near an estimated peak contain further information which is useful in estimating the trajectory of amplitude and pitch of the sinusoid without requiring the additional spectral frames of peak tracking. More specifically, the approach in Master solves analytically for the trajectory information by estimation of a chirp (linear frequency ramp) parameter using Fresnel integral approximation (for large parameters) and Taylor series expansions (for small parameters).