In the present state of the art audio coders for use in coding signals representative of, for example, speech and music, for purposes of storage or transmission, perceptual models based on the characteristics of the human auditory system are typically employed to reduce the number of bits required to code a given signal. In particular, by taking such characteristics into account, “transparent” coding (i.e., coding having no perceptible loss of quality) can be achieved with significantly fewer bits than would otherwise be necessary. The coding process in perceptual audio coders is compute intensive and generally requires processors with high computation power to perform real-time coding. The quantization module of the encoder takes up significant part of the encoding time.
In such coders the signal to be coded is first partitioned into individual frames with each frame comprising a small time slice of the signal, such as, for example, a time slice of approximately twenty milliseconds. Then, the signal for the given frame is transformed into the frequency domain, typically with use of a filter bank. The resulting spectral lines may then be quantized and coded.
In particular, the quantizer which is used in a perceptual audio coder to quantize the spectral coefficients is advantageously controlled by a psychoacoustic model (i.e., a model based on the performance of the human auditory system) to determine masking thresholds (distortionless thresholds) for groups of neighboring spectral lines referred to as one scale factor band. The psychoacoustic model gives a set of thresholds that indicate the levels of Just Noticeable Distortion (JND), if the quantization noise introduced by the coder is above this level then it is audible. As long as the Signal to (quantization) Noise Ratio (SNR) of the spectral bands are higher than the Signal to Mask Ratio (SMR) the quantization noise cannot be perceived. The spectral lines in these scale factor bands are then non-uniformly quantized and noiselessly coded (Huffman coding) to produce a compressed bit stream.
In MPEG (Moving Picture Experts Group) Audio coders (MP3 or AAC) a major portion of the processing time is spent in the quantization module as the process is carried out iteratively. MP3 refers to MPEG-1 and MPEG-2 Layer 3 Audio Coding. AAC refers to MPEG-2/4 Advanced Audio Coding. The Quantizer uses different values of step sizes for different scale factor bands depending on the distortion thresholds set by a psychoacoustic block.
In one conventional method, quantization is carried out in two loops in order to satisfy perceptual and bit rate criteria. Prior to quantization the incoming spectral lines are raised to a power of 3/4 (Power law Quantizer) so as to provide a more consistent SNR over the range of quantizer values. The two loops, to satisfy the perceptual and the bit rate criteria, are run over the spectral lines. The two loops consist of an outer loop (distortion measure loop) and an inner loop (bit rate loop). In the inner loop, the quantization step size is adjusted in order to fit the spectral lines within a given bit rate. The above iterative process involves modifying the step size (referred to as the global gain, as it is common for the spectrum) until the spectral lines fit into a specified number of bits. The outer loop then checks for the distortion caused in the spectral lines on a band-by-band basis, and increases quantization precision for bands that have distortion above JND. The quantization precision is raised through step sizes referred to as local gains. The above iterative process repeats itself until both the bit rate and the distortion conditions are met. The global gain k and the set of local gain for each band r are sent to the decoder along with the quantized spectral lines.
One significant disadvantage with the above quantization scheme is its complexity. The implementation of the above quantization scheme involves the above two iterative loops. Each of the two iterative loops involves quantization, noiseless coding, and inverse-quantization to find a best possible match. The codebook search mechanism involving noiseless coding and the complex mathematical operations involving quantization and dequantization stages make this a computationally intensive block. Therefore, a significant portion of the processing time in the above encoding scheme is spent in the quantization modules. One conventional system for quantizing the frequency domain coefficients essentially includes an optimized variant of the above two iterative loops scheme.
The two iterative loops described-above terminate when all bands have distortion levels below a threshold estimated by the psychoacoustic model. Such conditions typically occur at higher bit rates (over 96 kbps/channel). Using the above approach at medium to low bit rates can lead to many outer loop iterations before it can reach (one of many) set exit conditions.
The problem becomes even more severe at lower bit rates when it is not possible to maintain the quality (SNR above SMR).The two loops can run many times before ending at some compromised quality depending on implementation specific exit conditions. These numerous iterations can significantly increase processing time. Therefore, the above conventional quantization technique is highly complex and computationally intensive and can require processors with high computation power to perform real-time encoding. In addition, the above conventional quantization technique can take up significant part of an encoder's time.