Quantizer which is used in a perceptual audio coder to quantize spectral coefficients is advantageously controlled by a psychoacoustic model (i.e., a model based on the performance of the human auditory system) to determine excitation (perceivable energy) for groups of neighboring spectral lines referred to as a critical band. The perceptual model is used to detect perceptual irrelevancies in the audio data presented to it. Most audio encoders operate on frames of data.
A typical perceptual audio encoder includes a time-frequency analysis block, a psychoacoustic analysis block, and a quantization block. The psychoacoustic analysis block determines the amount of quantization noise that can be introduced by the encoder without introducing any perceivable noise. The time-frequency block transforms the input audio signal into the spectral domain, which is amenable to quantization and encoding in accordance with a perceptual distortion metric. If the quantization noise introduced by the encoder lies below perceptual distortion metric, the encoder is said to have maintained perceptually transparent audio quality.
Overall quality of an audio signal is measured by the weighted sum of noise-to-excitation ratios (NERs) of individual critical bands. A critical band is a group of spectral lines defined by psychoacoustic model based on the human auditory system. Inputs to quality measurements are, original spectral coefficients X[k], reconstructed (i.e., inverse quantized) spectral coefficients Xr[k], and a weight array W giving relative importance of critical bands in the computation of weighted sum NER.
Conventional techniques carry out quantization in two loops in order to satisfy perceptual distortion criteria and bit rate criteria. The two loops to satisfy the perceptual distortion (quality loop) and the bit rate criteria (bit-rate loop) are run over the spectral lines within a frame. In these loops, the quantization step size is adjusted in order to fit the spectral lines within a given bit rate, while maintaining minimal distortion, so as to maintain constant bit-rate over a specified period of time.
As described-above the psychoacoustic analysis is performed on a frame-by-frame basis which feeds in the excitation to the quantizer. At low bit rate some critical bands may be zeroed out due to the coarseness of quantization, which can lead to poor audio quality. The zeroing out of a critical band should reflect in the measurement on NER so that the bits allocated to this critical band can be adjusted to avoid resulting in poor audio quality. The zeroing out of a critical band is indicated pre-dominantly for a band when the re-constructed spectral coefficients are used to calculate the excitation. This may force the quality loop to re-adjust the step-size so as to avoid zeroing out of the critical band. Hence, in the quality loop, excitation needs to be calculated each iteration. This can lead to high computational complexity, as the excitation needs to be calculated each quantization iteration by the psychoacoustic model. In summary, the computation of the perceptual noise, while maintaining the perceptual quality, is generally complex using the above conventional technique.