1. Field of the Invention
The present invention relates to audio encoding and decoding, and more particularly, to a scalable audio encoding/decoding method and apparatus using bit-sliced arithmetic coding. The present invention is adopted as ISO/IEC JTC1/SC29/WG11 N1903 (ISO/IEC Committee Draft 14496-3 SUBPART 4).
2. Description of the Related Art
The MPEG audio standards or AC-2/AC-3 method provide almost the same audio quality as a compact disc, with a bitrate of 64xcx9c384 Kbps which is one-sixth to one-eighth that of conventional digital coding. For this reason, MPEG audio standards play an important role in storing and transmitting audio signals as in digital audio broadcasting (DAB), internet phone, or audio on demand (AOD).
Research into methods by which clear audio quality in its original sound can be reproduced at a lower bitrate have been ongoing. One method is an MPEG-2 Advanced Audio Coding (AAC) authorized as a new international standard. The MPEG-2 AAC providing the clear audio quality to the original sound at 64 kbps has been recommended by the experts group.
In conventional techniques, a fixed bitrate is given in an encoder, and the optimal state suitable for the given bitrate is searched to then perform quantization and coding, thereby exhibiting considerably better efficiency. However, with the advent of multimedia technology, there is an increasing demand for a coder/decoder (codec) having versatility at a low bitrate. One such demand is a scalable audio codec. The scalable audio codec can make bitstreams coded at a high bitrate into low bitrate bitstreams to then restore only some of them. By doing so, signals can be restored with a reasonable efficiency, with only some of the bitstreams, exhibiting little deterioration in performance due to lowered bitrates, when an overload is applied to the system or the performance of a decoder is poor, or by a user""s request.
According to general audio coding techniques such as the MPEG-2 AAC standards, a fixed bitrate is given to a coding apparatus, the optimal state for the given bitrate is searched to then perform quantization and coding, thereby forming bitstreams in accordance with the bitrate. One bitstream contains information for one bitrate. In other words, bitrate information is contained in the header of a bitstream and a fixed bitrate is used. Thus, a method exhibiting the best efficiency at a specific bitrate can be used. For example, when a bitstream is formed by an encoder at a bitrate of 64 Kbps, the best quality sound can be restored by a decoder corresponding to an encoder having a bitrate of 64 Kbps.
According to such methods, bitstreams are formed without consideration of other bitrates, but bitstreams having a magnitude suitable for a given bitrate, rather than the order of the bitstreams, are formed. Actually, if the thus-formed bitstreams are transmitted via a communications network, the bitstreams are sliced into several slots to then be transmitted. When an overload is applied to a transmission channel, or only some of the slots sent from a transmission end are received at a reception end due to a narrow bandwidth of the transmission channel, data cannot be reconstructed properly. Also, since bitstreams are not formed according to the significance thereof, if only some of the bitstreams are restored, the quality is severely degraded. The reconstructed audio data makes sound objectionable to the ear.
In the case of a scalable audio codec for solving the above-described problems, coding for a base layer is performed and then a difference signal between the original signal and the coded signal is coded in the next enhancement layer (K. Brandenburg. Et al., xe2x80x9cFirst Ideas on Scalable Audio Codingxe2x80x9d, 97th AES-Conventional, preprint 3924, San Francisco, 1994) and (K. Brandenburg, et al., xe2x80x9cA Two- or Three-Stage Bit Rate Scalable Audio Coding Systemxe2x80x9d, 99th AES-Convention, preprint 4132, New York, 1995). Thus, the more layers there are the poorer the performance at a high bitrate. In the case of using a scalable coding apparatus, a signal having good audio quality is reproduced initially. However, if the state of communication channels is worsened or the load applied to the decoder of a receiving terminal is increased, a sound having a low bitrate quality is reproduced. Therefore, the aforementioned encoding method is not suitable for practically attaining scalability.
To solve the above problems, it is an objective of the present invention to provide a scalable digital audio data encoding method, apparatus, and recording medium for recording the encoding method, using a bit-sliced arithmetic coding (BSAC) technique, instead of a lossless coding module,with all other modules of the conventional coder remaining unchanged.
It is another objective of the present invention to provide a scalable digital audio data decoding method, apparatus, and recording medium for recording the decoding method, using a bit-sliced arithmetic coding (BSAC) technique, instead of a lossless coding module,with all other modules of the conventional audio decoder remaining unchanged.
To achieve the first objective of the present invention, there is provided a scalable audio encoding method for coding audio signals into a layered datastream having a base layer and enhancement layers of a predetermined number, comprising the steps of: signal-processing input audio signals and quantizing the same for each predetermined coding band; and packing the quantized data to generate bitstreams, wherein the bitstream generating step comprises: coding the quantized data corresponding to the base layer; coding the quantized data corresponding to the next enhancement layer of the coded base layer and the remaining quantized data uncoded due to a layer size limit and belonging to the coded layer; and sequentially performing the layer coding steps for all enhancement layers to form bitstreams, wherein the base layer coding step, the enhancement layer coding step and the sequential coding step are performed such that the side information and quantized data corresponding to a layer to be coded are represented by digits of a same predetermined number; and then arithmetic-coded using a predetermined probability model in the order ranging from the MSB sequences to the LSB sequences, the side information containing scale factors and probability models to be used in the arithmetic coding.
The step of coding the scale factors comprises the steps of: obtaining the maximum scale factor; and obtaining differences between the maximum scale factor and the respective scale factors and arithmetic-coding the differences.
When the quantized data is composed of sign data and magnitude data, the coding step comprises the steps of: coding by a predetermined encoding method the most significant bit sequences composed of most significant bits of the magnitude data of the quantized data represented by the same number of bits; coding sign data corresponding to non-zero data among the coded most significant bit sequences; coding the most significant bit sequences among uncoded magnitude data of the digital data by a predetermined encoding method; coding uncoded sign data among the sign data corresponding to non-zero magnitude data among bit sequences; and performing the magnitude data coding step and the sign data coding step on the respective bits of the digital data.
The coding steps are performed by coupling bits composing the respective bit sequences for the magnitude data and sign data, into units of bits of a predetermined number.
A four-dimensional vector coupled in units of bits is divided into two subvectors according to its pre-states in coding the respective samples.
The bitrate of the base layer is 16 kbps and the interlayer bitrate is 8 kbps.
To achieve the second objective of the present invention, there is provided a scalable audio coding apparatus comprising: a quantizing portion for signal-processing input audio signals and quantizing the same for each coding band; and a bit packing portion for generating bitstreams by band-limiting for a base layer so as to be scalable, coding side information corresponding to the base layer, coding the quantized information sequentially from the most significant bit sequence to the least significant bit sequence, and from lower frequency components to higher frequency components, and coding side information corresponding to the next enhancement layer of the base layer and the quantized data, to perform coding on all layers.
The quantizing portion comprises: a time/frequency mapping portion for converting the input audio signals of a temporal domain into signals of a frequency domain; a psychoacoustic portion for coupling the converted signals by signals of predetermined subbands by time/frequency mapping and calculating a masking threshold at each subband using a masking phenomenon generated by interaction of the respective signals; and a quantizing portion for quantizing the signals for each predetermined coding band while the quantization noise of each band is compared with the masking threshold.
To achieve the third objective of the present invention, there is provided a scalable audio decoding method for decoding audio data coded to have layered bitrates, comprising the steps of: decoding side information having at least scale factors and arithmetic-coding model information allotted to each band, in the order of creation of the layers in datastreams having layered bitrates, by analyzing the significance of bits composing the datastreams, from upper significant bits to lower significant bits, using the arithmetic coding models corresponding to the quantized data; restoring the decoded scale factors and quantized data into signals having the original magnitudes; and converting inversely quantized signals into signals of a temporal domain.
The decoding of the scale factors are performed by the steps of: decoding the maximum scale factor in the bitstream, arithmetic-decoding differences between the maximum scale factor and the respective scale factors, and subtracting the differences from the maximum scale factor.
Also, there is provided a scalable audio decoding apparatus comprising: a bitstream analyzing portion for decoding side information having at least scale factors and arithmetic model information and quantized data, in the order of creation of the layers in layered bitstreams; an inverse quantizing portion for restoring the decoded scale factors and quantized data into signals having the original magnitudes; and a frequency/time mapping portion for converting inversely quantized signals of a frequency domain into signals of a temporal domain.
The invention may be embodied in a general purpose digital computer that is running a program from a computer usable medium, including but not limited to storage media such as magnetic storage media (e.g., ROM""s, floppy disks, hard disks, etc.), optically readable media (e.g., CD-ROMs, DVDs, etc.) and carrier waves (e.g., transmissions over the Internet). For instance, there is provided a computer usable medium, tangibly embodying a program of instructions executable by the machine to perform a scalable audio coding method for coding audio signals into a layered datastream having a base layer and enhancement layers of a predetermined number, the method comprising the steps of: signal-processing input audio signals and quantizing the same for each predetermined coding band; and packing the quantized data to generate bitstreams, wherein the bitstream generating step comprises: coding the quantized data corresponding to the base layer; coding the quantized data corresponding to the next enhancement layer of the coded base layer and the remaining quantized data uncoded due to a layer size limit and belonging to the coded layer; and sequentially performing the layer coding steps for all enhancement layers to form bitstreams, wherein the base layer coding step, the enhancement layer coding step and the sequential coding step are performed such that the side information and quantized data corresponding to a layer to be coded are represented by digits of a predetermined same number; and then arithmetic-coded using a predetermined probability model in the order ranging from the MSB sequences to the LSB sequences, the side information containing scale factors and probability models to be used in the arithmetic coding.
The scale factor coding step comprises the steps of: obtaining the maximum scale factor; and obtaining differences between the maximum scale factor and the respective scale factors and arithmetic-coding the same.
The coding of the information for the probability models is performed by the steps of: obtaining the minimum value of the probability model information values; obtaining differences the minimum probability model information and the respective model information values and arithmetic-coding the same using the probability models listed in Tables 5.5 through 5.9.
Also, there is provided a computer usable medium, tangibly embodying a program of instructions executable by the machine to perform a scalable audio decoding method for decoding audio data coded to have layered bitrates, comprising the steps of: decoding side information having at least scale factors and arithmetic-coding model information allotted to each band, in the order of creation of the layers in datastreams having layered bitrates, by analyzing the significance of bits composing the datastreams, from upper significant bits to lower significant bits, using the arithmetic coding models corresponding to the quantized data; restoring the decoded scale factors and quantized data into signals having the original magnitudes; and converting inversely quantized signals into signals of a temporal domain, a recording medium capable of reading a program for executing the scalable audio encoding method using a computer.
The bitstreams are decoded in units of four-dimensional vectors, and bit-sliced information of four samples in the four-dimensional vectors is decoded.
The decoding of the scale factors is performed by decoding the maximum scale factor, arithmetic-coding the differences between the maximum scale factor and the respective scale factors and subtracting the differences from the maximum scale factor.
The decoding of the arithmetic model indices is performed by decoding the minimum arithmetic model index in the bitstream, decoding differences between the minimum index and the respective indices in the side information of the respective layers, and adding the minimum index and the differences.