This application claims the priority of Korean Patent Application No. 2003-2718, filed on Jan. 15, 2003, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of the Invention
The present invention relates to compression of audio data, and more particularly, to a method and apparatus for shaping quantization noise generated when compressing audio data at a low bit rate.
2. Description of the Related Art
Compression of audio data is achieved by performing sampling, quantizing, encoding, and so forth. Quantization refers to expressing sampled signal values as stepped integers to represent the sampled values as predetermined representative values. Such a quantization process generates quantization noise. The quantization noise is an error component between an original signal and a quantized signal and is attenuated with an increase in a number of bits used for the quantization process. In quantization according to the Moving Picture Experts Group (MPEG), which are standards for coded representation of moving pictures and digital audio, a factor generated by a Discrete Cosine Transform (DCT) or a Modified DCT (MDCT) is divided by a predetermined value to express the factor as a low factor value so as to reduce an encoding amount.
Audio data should be compressed in consideration of the properties of the human auditory system. In general, one sound cannot be heard when a much louder sound is present. For example, if a person in an office speaks loudly, the others in the office can easily perceive who is speaking. However, if an airplane passes over the office building, the listeners cannot hear at all what the speaker is saying. In addition, after the airplane passed over the building, the listeners still cannot hear what the speaker is saying due to the lingering sound of the airplane. This is called a masking effect.
FIG. 1 illustrates the masking effect. Referring to FIG. 1, let us assume that an audio frequency contains a masking curve 130 indicating a sound energy level at which the average human can hear a sound. Since an audio signal A 110 has a sound energy level above the masking curve 130, the audio signal A 110 is audible to the average human. In contrast, since an audio signal B 120 has a sound energy level below the masking curve 130, the audio signal B 120 is inaudible to the average human.
Psychoacoustic model quantization refers to the quanitzation of only audio data with a sound energy level above a masking threshold by sectioning an audio frequency into frequency bands at predetermined intervals. The psychoacoustic model quantization is used in compression standards such as MPEG. However, in a case where audio data is compressed at a low bit rate below 64 Kbps, a number of bits used for quantization is limited. Thus, a general compression technique according to MPEG standards is not suitable for an effective compression of an audio signal.
FIGS. 2A and 2B show a quantization noise spectrum with respect to a frequency, the spectrum being generated after performing quantization.
In a psychoacoustic model, an audio signal is received, and then a Fast Fourier Transform (FFT) is performed to calculate and output a quantization threshold 210 in each frequency band. The quantization threshold 210 may be calculated so that the average human cannot discern between an original signal and a quantized signal. A quantization threshold in actual quantization may appear as reference numeral 210 or 240. If the quantization threshold 210 is obtained in the actual quantization, quantization noise may fall within the quantization threshold 210 according to the psychoacoustic model, which does not affect sound quality. If the quantization threshold 240 is obtained in the actual quantization, sound quality degrades. Thus, quantization noise has to be shaped so as to fall within the quantization threshold 210. However, since a low bit rate audio signal is expressed and quantized with a limited number of bits, quantization noise cannot always be shaped within a quantization threshold.
Accordingly, a conventional quantization algorithm used for the compression of an audio signal uses a simple way to confine a number of times quantization noise is shaped so that the shaping of the quantization noise ends when quantization noise cannot be below a quantization threshold calculated in the psychoacoustic model. The confinement may allow the quantization noise to have a predetermined shape, which causes the quantization noise to exceed the quantization threshold in a predetermined number of frequency bands. As a result, sound quality deteriorates.