In the present state of the art audio coders for use in coding signals representative of, for example, speech and music, for purposes of storage or transmission, perceptual models based on the characteristics of the human auditory system are typically employed to reduce the number of bits required to code a given signal. In particular, by taking such characteristics into account, “transparent” coding (i.e., coding having no perceptible loss of quality) can be achieved with significantly fewer bits than would otherwise be necessary. The coding process in perceptual audio coders is compute intensive and generally requires processors with high computation power to perform real-time coding. The quantization module of the encoder takes up a significant part of the encoding time.
In such coders, the signal to be coded is first partitioned into individual frames with each frame comprising a small time slice of the signal, such as, for example, a time slice of approximately twenty milliseconds. Then, the signal for the given frame is transformed into the frequency domain, typically with use of a filter bank. The resulting spectral lines may then be quantized and coded.
In particular, the quantizer which is used in a perceptual audio coder to quantize the spectral coefficients is advantageously controlled by a psychoacoustic model (i.e., a model based on the characteristics of the human auditory system) to determine masking thresholds (distortionless thresholds) for groups of neighboring spectral lines referred to as one critical factor band. The psychoacoustic model gives a set of thresholds that indicate the levels of Just Noticeable Distortion (JND); if the quantization noise introduced by the coder is above this level then it is audible. As long as the Signal-to-Noise Ratio (SNR) of the critical bands is higher than the Signal-to-Mask Ratio (SMR), the quantization noise cannot be perceived. The quantizer utilizes the SMRs to control bit allocation for the critical bands. The quantizer operates in such a way that, the difference between the SNR and the SMRs, which is the mask-to-noise ratio (MNR), is constant for all critical bands in the frame. Maintaining equal or near equal MNRs for all the critical bands ensures peak audio quality as the critical bands are equally distorted in a perceptual sense.
In MPEG (Moving Picture Experts Group) Audio coders a major portion of the processing time is spent in the quantization module as the process is carried out iteratively. The MPEG-I/II Layer 1 and Layer 2 encoders use uniform quantization schemes. The Quantizer uses different values of step sizes for different critical bands depending on the distortion thresholds set by a psychoacoustic block.
In one conventional method employing the uniform quantization schemes, quantization is carried out in an iterative fashion to satisfy perceptual and bit rate criteria. The iterative procedure includes determining the band with the lowest MNR and increasing the precision of the band using the next highest number of bits. The SNR of the band increases typically by about 6 db in this process, as the quantizer is uniform in nature. This is followed by calculating the new MNR of that band and updating the number of bits consumed during this process. The above procedure is repeated until the bit rate criterion is met.
Irrespective of the target bit rate, the conventional method begins encoding by assigning a lowest possible quantization step size to the critical bands. Thus, the complexity of the conventional method increases as the bit rate increases. Therefore, the conventional methods are highly computation intensive and can take up significant part of an encoder's time.