With the introduction of portable digital media players, the compact disk for music storage and audio delivery over the Internet, it is now common to store, buy and distribute music and other audio content in digital audio formats. The digital audio formats empower people to enjoy having hundreds or thousands of music songs available on their personal computers (PCs) or portable media players.
One benefit of digital audio formats is that a proper bit-rate (compression ratio) can be selected according to given constraints, e.g., file size and audio quality. On the other hand, one particular bit-rate is not able to cover all scenarios of audio applications. For instance, higher bit-rates may not be suitable for portable devices due to limited storage capacity. By contrast, higher bit-rates are better suited for high quality sound reproduction desired by audiophiles.
To cover a wide range of scenarios, scalable coding techniques are often useful. Typical scalable coding techniques produce a base bitstream with a high compression ratio, which is embedded within a low compression ratio bitstream. With such scalable coding bitstream, conversion from one compression ratio to another can be done quickly by extracting a subset of the compressed bitstream with a desired compression ratio.
Perceptual Transform Coding
The coding of audio utilizes coding techniques that exploit various perceptual models of human hearing. For example, many weaker tones near strong ones are masked so they do not need to be coded. In traditional perceptual audio coding, this is exploited as adaptive quantization of different frequency data. Perceptually important frequency data are allocated more bits and thus finer quantization and vice versa.
For example, transform coding is conventionally known as an efficient scheme for the compression of audio signals. In transform coding, a block of the input audio samples is transformed (e.g., via the Modified Discrete Cosine Transform or MDCT, which is the most widely used), processed, and quantized. The quantization of the transformed coefficients is performed based on the perceptual importance (e.g. masking effects and frequency sensitivity of human hearing), such as via a scalar quantizer.
When a scalar quantizer is used, the importance is mapped to relative weighting, and the quantizer resolution (step size) for each coefficient is derived from its weight and the global resolution. The global resolution can be determined from target quality, bit rate, etc. For a given step size, each coefficient is quantized into a level which is zero or non-zero integer value.
At lower bitrates, there are typically a lot more zero level coefficients than non-zero level coefficients. They can be coded with great efficiency using run-length coding, which may be combined with an entropy coding scheme such as Huffman coding.