The present invention relates to a digital audio encoding method and device thereof, and more particularly, to a digital audio encoding method utilizing a look-up table (LUT) and device thereof.
Current communication technology converts analog data into digital data. To meet such a trend, a digital transmission is becoming necessary in all audio devices and even in audio data transmission systems. Such digital audio data transmission is stronger against ambient noise than existing analog transmission methods. Furthermore, sound quality can be reproduced as clearly as in a compact disc (CD). However, as the quantity of data to be transmitted increases, many problems are involved in the capacity of memories and/or transmission channels.
To solve such problems, compression technology is necessary. The target of audio compression technology is to reproduce a sound to be the same as the original sound. The compressed sound data is transmitted after compressing an original sound and it hears them again after decompressing the transmitted sound data.
At present, such technology is being rapidly developed all over the world. At the forefront of this technology is Sony Corp. with its mini disc (MD) made in Japan in 1992. Phillips Corp. is also at the forefront of this technology with their digital compact disc (DCC). In case of the MD, for example, the size thereof is relatively smaller than that of the existing CD while it reproduces sound to CD level quality. Also, the MD can store much bigger amounts of data than the CD with a compression ratio of about 5:1. Further, the MD is resistant to external shock.
On the other hand, with respect to video, the International Standard Organization for a digital compression encoding technology, namely, MPEG (Moving Picture Expert Group) was established. The MPEG is largely classified by three parts, that is, system, video and audio parts. Among them, the audio part is subdivided into three layers.
The MPEG compares, analyzes and tests various low-transmission-rate encoding technologies proposed to enact the international standards for the coded expression of moving pictures and the audio signals corresponding thereto. If such international standards are established, the data must be encoded and stored to meet such standards for all digital storage media. Here, the digital storage media include a compact disk-read only memory (CD-ROM), a digital audio tape (DAT), a magneto-optical disc (MOD) and a computer disc (for example, hard disk drive).
In compression encoding of an audio signal, a psychoacoustic model of a human is generally utilized. Using a masking phenomenon and critical bands among the acoustic characteristics, an inaudible signal is removed and only a requisite signal is coded and bits are allocated thereto, so that a sound quality of almost the same level as that of the original sound is obtained even if the signal is coded with fewer bits than those of the original signal.
Here, the masking phenomenon is a phenomenon in which a human does not sense a sound at all by masking a signal using another signal due to an interference between audio signals. Also, a critical band by which a sound frequency is distinguished by a human is generally divided into twenty-four bands. The higher the frequency is, the wider the bands become on a logarithmic scale. Accordingly, it is not easy to distinguish a higher frequency signal from a lower frequency signal.
To allocate a bit using such acoustic characteristics, a signal-to-noise ratio (SNR) and a signal-to-mask ratio (SMR) are obtained and then a mask-to-noise ratio(MNR) must be calculated therefrom. Here, a mask level is a minimum signal level which is insensible by a human. Accordingly, it is not necessary to allocate a bit to a signal below this mask level.
A final MNR is obtained through the above process and then a bit is repeatedly allocated based on the final MNR. However, a lot of operation time could be required during such a process, which means that a real time delay is increased in an encoder. Thus, it becomes necessary to reduce complexity of the operations.
Now, referring FIG. 1, a general MPEG audio encoding apparatus will be described briefly.
A frequency mapping part 11 converts audio data of a time domain into that of a frequency domain having 32 equal bands by using a band analysis filter. At this time, each band includes twelve samples in case of a layer I and thirty six samples in case of a layer II. On the other hand, since the number of a scale factor is sixty four in total, the number of bits necessary for encoding this information is six bits. There is a little difference between encoding methods depending on layers. In the layer I, among twelve samples included in each band, the largest value is obtained and the same or slightly larger value is chosen as a scale factor. In the layer II, since three scale factors exist in each band, a similarity in the respective scale factors is investigated to then decide how many among three scale factors are to be coded. In other words, the number of scale factors to be coded is different depending on the range of a difference value between adjacent scale factors. Accordingly, additional information is needed in selecting a scale factor, unlike in the layer I, where the information is coded with 2 bits.
A psychoacoustics model 13 is the part having the largest operation complexity in the encoding apparatus. A final output value of the psychoacoustics model is an SMR of each band as a standard of a bit allocation. The SMR value is calculated by a series of steps as follows. An audio signal of a time domain is converted into the frequency domain by a fast Fourier transform (FFT) in a first step. A sound pressure level for each band is calculated in a second step. An absolute threshold is calculated in a third step. Voiced and voiceless sound components of an audio signal are decided in a fourth step. A masker is decided in a fifth step. An absolute threshold of each band is calculated in a sixth step. A total absolute threshold is calculated in a seventh step. A minimum absolute threshold of each band is calculated in a eighth step. An SMR value of each band is calculated in a ninth step.
In a bit allocating and quantizing part 15, first, in a bit allocating step, the quantity of allocated bits of each is obtained by repeatedly performing the following steps in sequence based on the SMR value obtained by the psychoacoustics model 13. In a first step a, an initially allocated bit is set to zero (0). In a second step, an MNR value of each band is obtained. At this time, the MNR value is obtained by subtracting the SMR value from the SNR value. In a third step, the band having the minimum MNR value among MNRs obtained for the respective bands is searched and then the number of allocated bits is increased by 1. In a fourth step, the second and third steps are repeatedly performed with respect to the remaining bands if the required number of bits is not exceeded in a fourth step.
On the other hand, the quantization process is performed through the following steps in sequence. In a first step, samples of each band are divided by a scale factor to set the value obtained thereby to an X. In a second step, a value of A*X+B is calculated (Here, A and B are predetermined values.). In a third step, among the calculated values, the number of allocated bits obtained from the bit allocating step is obtained. The most significant bit (MSB) is reversed in a fourth step.
As the mentioned above, since the conventional digital audio encoding apparatus utilizes a psychoacoustics model, a nine-step process is necessary for obtaining the SMR value. Accordingly, the complexity of the operation increases, which exerts a big influence in overall performance time. Also, an MNR value is calculated again using the SMR values obtained by such a method. Time delay also occurs in this procedure due to repeated performance of the bit allocation loops based on the thus-calculated MNR.
As the result of actual experiment, it was understood from the following Table 1 that the complexity of the operation in the psychoacoustics model generation process and bit allocation process is high, i.e., about 49.9% of the performance time of the overall encoding process.
TABLE 1 ______________________________________ Overall performance Psychoacoustic model & time of encoding bit allocation process apparatus (1/60 sec) (1/60 sec) Rate (%) ______________________________________ 22662 11590 49.9 ______________________________________