Signal processing is a tool that can be used to gather and display information about audio events. Information about the event may include the frequency of the audio event (i.e., the number of occurrences of a repeating event per unit time), its onset time, its duration and the source of each sound.
Developments in audio signal analysis have resulted in a variety of computer-based systems to process and analyze audio events generated by musical instruments or by human speech, or those occurring underwater as a result of natural or man-made activities. However, past audio signal processing systems have had difficulty analyzing sounds having certain qualities such as:                (A) multiple distinct fundamental frequencies components (“FFCs”) in the frequency spectrum; and/or        (B) one or more integral multiples, or harmonic components (“HCs”), of a fundamental frequency in the frequency spectrum.        
Where an audio signal has multiple FFCs, this makes the processing of such signals difficult. The difficulties are heightened when HCs related to the multiple FFCs interfere with each other as well as the FFCs. In the past, systems analyzing multiple FFC signals have suffered from problems such as:                erroneous results and false frequency detections;        not handling sources with different spectra profiles or where FFC(s) of a sound is/are not significantly stronger in amplitude than associated HC(s);        
and also, in the context of music audio signals particularly:                mischaracterizing the missing fundamental: where the pitch of an FFC is heard through its HC(s), even though the FFC itself is absent;        mischaracterizing the octave problem: where an FFC and its associated HC(s), or octaves, are unable to be separately identified; and        spectral masking: where louder musical sounds mask other musical sounds from being heard.        
Prior systems that have attempted to identify the FFCs of a signal based on the distance between zero crossing-points of the signal have been shown to inadequately deal with complex waveforms composed of multiple sine waves with differing periods. More sophisticated approaches have compared segments of a signal with other segments offset by a predetermined period to find a match: average magnitude difference function (“AMDF”), Average Squared Mean Difference Function (“ASMDF”), and similar autocorrelation algorithms work this way. While these algorithms can provide reasonably accurate results for highly periodic signals, they have false detection problems (e.g., “octave errors,” referred to above), trouble with noisy signals, and may not handle signals having multiple simultaneous FFCs (and HCs).
Brief Description of Audio Signal Terminology
Before an audio event is processed, an audio signal representing the audio event (typically an electrical voltage) is generated. Audio signals are commonly a sinusoid (or sine wave), which is a mathematical curve having features including an amplitude (or signal strength), often represented by the symbol A (being the peak deviation of the curve from zero), a repeating structure having a frequency, f (being the number of complete cycles of the curve per unit time), and a phase, φ (which specifies where in its cycle the curve commences).
The sinusoid with a single resonant frequency is a rare example of a pure tone. However, in nature and music, complex tones generally prevail. These are combinations of various sinusoids with different amplitudes, frequencies and phases. Although not purely sinusoidal, complex tones often exhibit quasi-periodic characteristics in the time domain. Musical instruments that produce complex tones often achieve their sounds by plucking a string or by modal excitation in cylindrical tubes. In speech, a person with a “bass” or “deep” voice has lower range fundamental frequencies, while a person with a “high” or “shrill” voice has higher range fundamental frequencies. Likewise, an audio event occurring underwater can be classified depending on its FFCs.
A “harmonic” corresponds to an integer multiple of the fundamental frequency of a complex tone. The first harmonic is synonymous to the fundamental frequency of a complex tone. An “overtone” refers to any frequency higher than the fundamental frequency. The term “inharmonicity” refers to how much one quasi-periodic sinusoidal wave varies from an ideal harmonic.
Computer and Mathematical Terminology: The discrete Fourier transform (“DFT”) converts a finite list of equally spaced samples of a function into a list of coefficients of a finite combination of complex sinusoids, which have those same sample values. By use of the DFT, and the inverse DFT, a time-domain representation of an audio signal can be converted into a frequency-domain representation. The fast Fourier transform (“FFT”), is a DFT algorithm that reduces the number of computations needed to perform the DFT and is generally regarded as an efficient tool to convert a time-domain signal into a frequency-domain signal.