1. Field of the Invention
The present invention relates to an encoding method and apparatus for encoding digital audio data, and more particularly, to a method and apparatus in which an advanced psychoacoustic model is used so that the amount of computation and complexity needed in the encoding method and apparatus is reduced without degradation of sound quality.
2. Description of the Related Art
A moving picture experts group (MPEG) audio encoder allows a listener not to perceive quantization noise generated when data is encoded. At the same time, the MPEG audio encoder achieves a high compression rate. An MPEG-1 audio encoder standardized by the MPEG encodes an audio signal at a bit rate of 32 kbps˜448 kbps. The MPEG-1 audio standard has 3 different algorithms for encoding data.
The MPEG-1 encoder has 3 modes, including layer 1, layer 2, and layer 3. Layer 1 implements a basic algorithm, while layers 2 and 3 are enhanced modes. The layers at higher levels achieve a higher compression rate, but on the other hand, the size of the hardware becomes larger.
The MPEG audio encoder uses a psychoacoustic model which closely mirrors a characteristic of human hearing, in order to reduce perceptual redundancy of a signal of an audio encoder. The MPEG1 and MPEG2, standardized by the MPEG, employ a perceptual coding method using a psychoacoustic model which reflects the characteristic of human perception and removes perceptual redundancy such that a good sound quality can be maintained after decoding data.
The perceptual coding method, by which a human psychoacoustic model is analyzed and applied, uses a threshold in a quiet and a masking effect. The masking effect is a phenomenon in which a small sound less than a predetermined threshold is masked by a big sound, and this masking between signals existing in an identical time interval is also referred to as frequency masking. At this time, depending on the frequency band, the threshold of the masked sound varies.
By using the psychoacoustic model, a maximum noise model that is inaudible in each subband of a filter band can be determined. With this noise level in each subband, that is, with the masking threshold, a signal to mask ratio (SMR) value of each subband can be obtained.
The coding method using the psychoacoustic model is disclosed in the U.S. Pat. No. 6,092,041, “System and method of encoding and decoding a layered bitstream by re-applying psychoacoustic analysis in the decoder” assigned to Motorola, Inc.
FIG. 1 is a block diagram showing an ordinary MPEG audio encoding apparatus. Here, among the MPEG audio encoders, the MPEG-1 layer 3 audio encoder, that is, the MP3 audio encoder, will now be explained as an example.
The MP3 encoder comprises a filter bank 110, a modified discrete cosine transform (MDCT) unit 120, a fast Fourier transform (FFT) unit 130, a psychoacoustic model unit 140, a quantization and Huffman encoding unit 150, and a bitstream formatting unit 160.
The filter bank 110 divides an input time domain audio signal into 32 frequency domain subbands in order to remove statistical redundancy of the audio signal.
By using window switching information input from the psychoacoustic model unit 140, the MDCT unit 120 divides the subbands, which are divided in the filter bank 110, into finer frequency bands in order to increase frequency resolution. For example, if the window switching information, which is input from the psychoacoustic model unit 140, indicates a long window, the 32 subbands are divided into finer frequency bands by using 36 point MDCT, and if the window switching information indicates short window, the 32 subbands are divided into finer frequency bands by using 12 point MDCT.
The FFT unit 130 converts the input audio signal into a frequency domain spectrum and outputs the spectrum to the psychoacoustic model unit 140.
In order to remove perceptual redundancy according to the characteristic of human hearing, the psychoacoustic model unit 140 uses the frequency spectrum output from the FFT unit 130 and determines a masking threshold that is a noise level inaudible in each subband, that is, an SMR. The SMR value determined in the psychoacoustic model unit 140 is input to the quantization and Huffman encoding unit 150.
In addition, the psychoacoustic model unit 140 calculates a perceptual energy level to determine whether or not to perform window switching, and outputs window switching information to the MDCT unit 120.
In order to process the frequency domain data which is input from the MDCT unit 120 after the MDCT is performed, the quantization and Huffman encoding unit 150 performs bit allocation to remove perceptual redundancy and quantization to encode the audio data, based on the SMR value input from the psychoacoustic model unit 140.
The bit stream formatting unit 160 formats the encoded audio signal, which is input from the quantization and Huffman encoding unit 150, into bit streams specified by the MPEG and outputs the bit streams.
As described above, the prior art psychoacoustic model shown in FIG. 1 uses the FFT spectrum obtained from the input audio signal in order to calculate the masking threshold. However, the filter bank causes aliasing and values obtained from components in which aliasing has occurred are used in the quantization step. In the psychoacoustic model, if an SMR is obtained based on the FFT spectrum and the SMR is used in the quantization step, an optimal result cannot be obtained.