In the present state of the art, audio coders for use in coding signals representative of, for example, speech and music, for purposes of storage or transmission, perceptual models based on the characteristics of the human auditory system are typically employed to reduce the number of bits required to code a given signal. In particular, by taking such characteristics into account, “transparent” coding (i.e., coding having no perceptible loss of quality) can be achieved with significantly fewer bits than would otherwise be necessary.
In such coders the signal to be coded is first partitioned into individual frames with each frame comprising a small time slice of the signal, such as, for example, a time slice of approximately twenty milliseconds. Then, the signal for the given frame is transformed into the frequency domain, typically with use of a filter bank. The resulting spectral lines may then be quantized and coded.
In particular, the quantizer which is used in a perceptual audio coder to quantize the spectral coefficients is advantageously controlled by a psychoacoustic model (i.e., a model based on the performance of the human auditory system) to determine masking thresholds (distortionless thresholds) for groups of neighboring spectral lines referred to as one scale factor band. The psychoacoustic model gives a set of thresholds that indicate the levels of Just Noticeable Distortion (JND), if the quantization noise introduced by the coder is above this level then it is audible. As long as the Signal to (quantization) Noise Ratio (SNR) of the spectral bands are higher than the Signal to Mask Ratio (SMR) the quantization noise cannot be perceived. The spectral lines in these scale factor bands are then non-uniformly quantized and noiselessly coded (Huffman coding) to produce a compressed bit stream. The Quantizer uses different values of step sizes for different scale factor bands depending on the distortion thresholds set by a psychoacoustic block.
The parameter controlling the compression ratios achieved by the encoder is externally decided by a bit rate parameter, which is the data rate of an output bit stream. Depending on the mode of operation, the data rate per frame can be variable or constant or can average around a constant bit rate. For applications involving streaming at low bit rates the preferred mode of operation is one of constant bit rate.
In one conventional method, quantization is carried out in two loops in order to satisfy perceptual and bit rate criteria. Prior to quantization, the incoming spectral lines are raised to a power of ¾ (Power law Quantizer) so as to provide a more consistent SNR over the range of quantizer values. The two loops, to satisfy the perceptual and the bit rate criteria, are run over the spectral lines. The two loops consist of an outer loop (distortion measure loop) and an inner loop (bit rate loop). In the inner loop, the quantization step size is adjusted in order to fit the spectral lines within a given bit rate. The above process involves modifying the step size (referred to as the global gain, as it is common for the spectrum) until the quantized spectral lines fit into a specified number of bits. The outer loop then checks for the distortion caused in the spectral lines on a band-by-band basis, and increases quantization precision for bands that have distortion above JND. The quantization precision is raised through step sizes referred to as local gains. The above iterative process repeats itself until both the bit rate and the distortion conditions are met.
The masking thresholds are usually computed frame-by-frame and slight variations of one masking threshold from one frame to the next may lead to very different bit assignments. As a result, at low bit rates some groups of spectral coefficients may appear and disappear. This spurious energy constitutes several auditory objects, which are different from the main energy and are thus clearly perceived. These kinds of artifacts, known as “birdies”, are generally encountered at low bit rates.
Conventional solution to quantize with minimal distortion is to employ a low pass filter. This ensures that most of the high frequency content disappears and hence the total number of critical bands to encode comes down. This generally leads to degradation in signal quality. However, this solution does not guarantee the disappearance and appearance of the in-band frequency content, and hence does not ensure complete elimination of the birdie artifact.