Digital audio transmission typically requires a considerable amount of memory and bandwidth. To achieve an efficient transmission, signal compression is generally employed. Efficient coding systems are those that could optimally eliminate irrelevant and redundant parts of an audio stream. The first is achieved by reducing psycho acoustical irrelevancy through psychoacoustics analysis. The phrase “perceptual audio coder” refers to those compression schemes that exploit the properties of human auditory perception.
FIG. 1 illustrates the basic structure of a perceptual encoder 100. Typically, a perceptual encoder 100 includes a filter bank 110, a quantization unit 120, and a psychoacoustics module 130. The psychoacoustics module 130 can include spectral analysis 132 and masking threshold calculation 134. In a more advanced encoder, extra spectral processing is performed before the quantization unit 120. This spectral processing block is used to reduce redundant components and includes mostly prediction tools. These basic building blocks make up the differences between various perceptual audio encoders. The quantization unit 120 can feed an entropy coding unit 140.
The filter bank 110 is responsible for time-to-frequency transformation. The move to the frequency domain is used since the encoding utilizes the masking property of the human ear, which is calculated in the frequency domain. The window size and transform size determines the time and frequency resolution, respectively. Most encoders are equipped with the ability to adapt to fast changing signals by switching to more refined time resolutions. This block switching strategy may be crucial to avoid pre-echo artifacts, which refer to the spreading of quantization noise throughout the window size.
Earlier encoders, such as MPEG layer 1 and layer 2 encoders, use a subband filter as their transform engine. MPEG layer 3 uses a hybrid filter, which is an enhancement of the subband filter with Modified Discrete Cosine Transform (MDCT). The Advanced Audio Coder (AAC) dropped the backward compatibility with previous encoders and uses only MDCT. A similar transform was also used in Dolby AC3. The advantage of using MDCT is in its Time Domain Aliasing Cancellation (TDAC) concept, which removes the blocking artifacts.
The psychoacoustics module 130 determines the masking threshold, which is needed to judge which part of a signal is important to perception and which part is irrelevant. The resulting masking threshold is also used to shape the quantization noise so that no degradation is perceived due to this quantization process. The details of psychoacoustics modeling are known to those of skill in the art and are unnecessary for understanding the embodiments disclosed below.
Bit allocation and quantization is the last crucial module in a typical perceptual audio encoder. A non-uniform quantizer is used to reduce the dynamic range of the data, and two quantization parameters for step size determination are adjusted such that the quantization noise falls below the masking threshold and the number of bits used is below the available bit rate. These two conditions are commonly referred to as distortion control loop and rate control loop. Within the quantization, more advanced encoders, such as MPEG layer 3 and AAC, incorporate noiseless coding for redundancy reduction to enhance the compression ratio.
The presence of the psychoacoustics module and the bit allocation-quantization are two reasons why an encoder has a much higher complexity compared to a decoder. While audio encoding standards are definite enough to ensure that a valid stream is correctly decodable by the decoders, they are flexible enough to accommodate variations in implementations, suited to different resource availability and application areas.