1. Field of the Invention
The present invention relates to an apparatus for decoding audio data and a method thereof, and more particularly, to an apparatus for decoding audio data with scalability and a method thereof.
2. Description of the Background Art
Bit sliced arithmetic coding (BSAC) is suggested as a moving picture experts group (MPEG) 4 audio compressing method obtained by partially improving the performance of an advanced audio coding (AAC) compressing method.
In the BSAC, a transmitting end codes a signal to an audio signal of a base layer and an audio signal of an enhancement layer. In a receiving end, a user who has a low quality decoder decodes only the audio signal of the base layer to reproduce a basic audio signal and a user who has a high quality decoder adds the audio signal of the enhancement layer to the audio signal of the base layer to reproduce a high quality audio signal.
In such a method, the MPEG-4 introduces a fine grain scalability (FGS) method of transmitting the audio signal of each layer in units of bit planes in order to make it unnecessary to await until the receiving end receives the entire bit stream transmitted by the transmitting end and to let the received audio signal restored using only the bit stream received until then even when the receiving end does not receive the entire bit stream transmitted by the transmitting end.
The FGS is a compression transmitting method in which decoding can be performed by only a partial bit stream of the entire bit stream. In the FGS, the audio signal to be transmitted to the receiving end is divided by bit planes so that the most significant bit (MSB) is coded to be first transmitted. Then, the next significant bit is divided by bit planes to be coded and to be continuously transmitted.
FIG. 1 illustrates the structure of a bit stream in accordance with a conventional audio coding method.
Referring to FIG. 1, the frame of a bit stream is coded so that a quantization sample and side information are mapped to a layer structure for the FGS. That is, in the layer structure, the bit stream of a lower layer is comprised in the bit stream of an upper layer and side information items required for each layer are divided by layer to be coded.
In the head of the bit stream, a header region in which header information is stored is provided, information on a layer 0 is packed, and information items on layers 1 to N (N is an integer larger than or equal to 1) that are enhancement layers are packed in the order. From the header region to the information on the layer 0 is referred to as a base layer. From the header region to the information on the layer 1 is referred to as the layer 1. From the header region to the information on the layer 2 is referred to as the layer 2. In the same manner, from the header region to the information on the layer N, that is, from the base layer to the layer N that is the enhancement layer is referred to as a top layer. Side information and a coded audio signal are stored as information on each layer. For example, side information 2 and coded quantization samples are stored as the information on the layer 2.
In such a structure, the decoder of the receiving end does not always decode the bit rate compressed by the decoder of the transmitting end in the same bit rate but decodes the bit rate in units of 1 kbps so that the encoding bit rate of a target layer that is one of the enhancement layers is used as the maximum bit rate and the bit rate of the base layer is used as the minimum bit rate.
FIG. 2 illustrates a full search method of obtaining the maximum significance value max_snf in a conventional audio decoding method.
The receiving end receives the bit stream illustrated in FIG. 1 to perform arithmetic decoding on each frame. FIG. 2 illustrates a full search method of searching the maximum significance value max_snf required for determining whether the arithmetic decoding is required for an arbitrary layer among the base layer to the top layer.
Even when the arithmetic decoding is required by the maximum significance value max_snf, the current significance value current_snf of each frequency component of the audio signal is examined to determine whether the arithmetic decoding is required.
However, the full search method is used for all of the searches made herein, that is, the search of the maximum significance value max_snf and the comparison between the current significance value current_snf and the maximum significance value max_snf.
For example, when it is assumed that a frequency search range is 510, that the number of channels is 2, and that the number of window groups is 8 as illustrated in FIG. 2, the number of times of comparison to be performed in order to find the maximum significance value max_snf is 510*2*8=8,160 per a layer, which is performed on each frame by the number of layers. For example, when the number of base sub layers base_sublayer is 10 and the number of layers is 48, the comparison must be performed 8,160*58=473,280 number of times.
As described above, a method of comparing all of the current significance values current_snf with all of the coefficients to find the largest value in order to find the arbitrary maximum significance value max_snf in an arbitrary frequency search range is referred to as the full search method.
In the full search method, the amount of calculations per a frame for finding the maximum significance value max_snf is ‘the frequency search range*the number of channels*the number of window groups*the number of layers’. In such a method, since the current significance value current_snf must be compared with the coefficients to find the maximum significance value max_snf in each layer, channel, window group, and frequency search range, the amount of unnecessary operations increases to deteriorate the performance of the decoder and to increase cost.