Audio coding or audio compression algorithms are used to obtain compact digital representations of high-fidelity (wideband) audio signals for the purpose of efficient transmission or storage. The central objective in audio coding is to represent the signal with a minimum number of bits while achieving transparent signal reproduction, i.e., generating output audio that cannot be distinguished from the original input, even by a sensitive listener.
Types of perceptual audio coding have been developed which achieve coding gain by exploiting both perceptual irrelevancies and statistical redundancies. Perceptual irrelevancies, for example, allow for certain distortion levels which are inaudible (and therefore irrelevant) because of masking by appropriate audio-signal levels. Psychoacoustic signal analysis is often utilized to estimate such audio signal masking power based on psychoacoustic principles. Such a psychoacoustic model delivers masked thresholds that quantify the maximum amount of allowable distortion at each point in the time-frequency plane such that quantization of time-frequency parameters does not introduce audible artifacts, allowing quantization in encoding to exploit perceptual irrelevancies and provide an improved coding gain.
A wide variety of methods have been utilized to determine the nature of any input audio signal to estimate the masked threshold. Among other techniques, most known methods make a distinction between tone-like and noise-like components of the audio signal, referred to herein as “tonality”. Depending on this classification, the masked threshold level is significantly different. Thus, the allowable distortion level depends on the tonality of the audio signal components. Some known methods to estimate the tonality include a spectral flatness measure, use of complex spectral coefficients, loudness uncertainty measures, and envelope fluctuation measures.
In a spectral flatness measure, the input audio spectrum is examined to determine whether there are distinct peaks, and if so, the input audio signal is considered to be most likely tonal, while if the input audio spectrum is generally flat, the input audio signal is considered to be largely noise-like. Complex spectral coefficients also may be utilized, in which spectral coefficients from one frame to the next are predicted and/or examined to determine whether the variation is primarily in the nature of phase shifts, and if so, the input audio signal is considered tone-like. Loudness uncertainty measures determine loudness variations over time, with fluctuations in loudness indicative of a noise-like input signal. Similarly, envelope fluctuations may also be utilized to examine various energy levels in sub-bands, where significant fluctuation is again indicative of a noise-like signal.
Such prior art methods, however, have proved unreliable if the input spectrum is largely harmonic, having fundamental frequencies with overtones, such as in music and speech. Such prior art methods also have proved unreliable, especially with different instruments having different fundamental frequencies or varying fundamental frequencies over time, e.g., vibrato in singing or instrumental sounds.