Digital audio coding (also called “digital audio compression”) is a variety of techniques for minimizing the size of audio data prior to storage (to reduce storage requirements) or transmission (to reduce bandwidth requirements). Perceptual audio coding techniques take in consideration how humans actually perceive sound and give more attention to frequencies of an audio signal that humans hear most clearly and to less attention to frequencies that humans are less likely to notice any difference.
One class of digital audio coding is known as transform-based coding. Transform-based audio coding transforms a time signal into a frequency-domain vector of coefficients prior to quantization and encoding. One common type of transform is the modified discrete cosine transform (MDCT). The MDCT is a lapped transform, meaning that the transform is performed on blocks that overlap, and mitigates audible artifacts that occur at block boundaries. The MDCT is used in several lossy audio codecs and techniques.
The MDCT coefficients representing a given subband are typically quantized using a vector quantization (VQ) technique. The VQ uses a minimum mean square error (MMSE) approach to capture as many of the coefficients as possible given a number of available bits. The MMSE approach is an estimation method that seeks to minimize the mean square error. In the upper frequency spectrum of a typical audio signal the subbands are noise-like and each upper subbands contain a large number of non-zero transform coefficients.
Problems arise, however, when transform coefficients in a subband are quantized in a coarse manner. In particular, the upper subbands of an audio signal typically are allocated a lower number of bits than the lower subbands. If the VQ technique does not have available bits to vector-quantize a given subband then often only a single coefficient will be quantized, effectively creating a single-coefficient subband. At the decoder, instead of recreating a noise-like signal in this subband, the single-coefficient subband will have a “tonal” sound. Because the single-coefficient moves in time and frequency it creates a “musical noise” or “birdie” artifact. This musical noise or birdie artifact reveals itself to a listener as metallic tones that randomly appear and disappear in the played back audio content.