1. Field of the Invention
The present invention relates to audio encoding and decoding, and more particularly, to a scalable stereo audio encoding/decoding method and apparatus using bit-sliced arithmetic coding.
2. Description of the Related Art
In a conventional scalable audio encoding/decoding apparatus, scalability of a 1-channel mono signal was taken into consideration [K. Brandenbrug, et. al., xe2x80x9cFirst Ideas on Scalable Audio Codingxe2x80x9d, 97th AES-Convention, preprint 3924, San Francisco, 1994] and [K. Brandenburg, et al., xe2x80x9cA two- or Three-Stage Bit Rate Scalable Audio Coding Sustemxe2x80x9d, 99th AES-Convention, preprint 4132, New York, 1995]. However, MPEG audio standards [MPEG Committee ISO/IEC/JTCI/SC29/WG11, Information technologyxe2x80x94Coding of moving pictures and associated audio for data storage media to about 1.5 Mbit/sxe2x80x94Part 3: Audio, ISO/IEC IS 11172-3, 1998] or AC-2/AC-3 methods [Dolby, xe2x80x9cDolby AC-3 Multi-Channel Audio Codingxe2x80x94Submission to the Grand Alliance Audio Specialist Groupxe2x80x9d, Dolby Lab., August, 1993] provide a technology for processing stereo and multi-channel signals as well as mono signals. In practice, most musical signals are composed of stereo signals. Thus, it is necessary to employ scalable audio codec adoptable to signals composed of two or more channel bitstreams as in the Internet or communications network.
Generally, musical signals are stereo signals. The stereo signals are provided through a compact disc (CD), a communications network or a broadcast network, and will be provided under multimedia environments in the future. However, existing scalable audio codecs have mostly treated mono signals and have not yet processed stereo signals. To process stereo signals, signal transmission must be performed such that all signals for one channel are transmitted and signals for another channel are then transmitted. In this case, however, since the quantities of bits generated in two channels are not always the same, the performance of scalable audio codec is considerably lower at a lower bitrate for the stereo signals.
To solve the above problems, it is an objective of the present invention to provide a scalable stereo digital audio data encoding method and apparatus, and a recording medium for recording the encoding method. Encoding is performed by generating bitstreams comprised of several enhancement layers based on a base layer using a bit-sliced arithmetic coding (BSAC) technique.
To achieve the objective of the present invention, there is provided scalable stereo audio encoding method for coding audio signals into a layered datastream having a base layer and at least two enhancement layers, including the steps of: signal-processing input audio signals and quantizing the same for each predetermined coding band, coding the quantized data corresponding to the base layer among the quantized data, coding the quantized data corresponding to the next enhancement layer of the coded base layer and the remaining quantized data uncoded due to a layer size limit and belonging to the coded layer, and sequentially performing the layer coding steps for all enhancement layers to form bitstreams, wherein the base layer coding step, the enhancement layer coding step and the sequential coding step are performed such that the side information and quantized data corresponding to a layer to be coded are represented by digits of a same predetermined number, and then arithmetic-coded using a predetermined probability model in the order ranging from the Most Significant Bit (MSB) sequences to the Least Significant Bit (LSB) sequences, bit-sliced left-channel data and right-channel data being alternately coded in units of predetermined vectors. The side information includes at least scale factors and information on a probability model to be used in arithmetic coding. The predetermined vectors are four-dimensional vectors produced by coupling the four bit-sliced audio channel data into one vector. The four-dimensional vectors are divided into two subvectors according to prestates indicating whether non-zero bit-sliced frequency components are coded or not, to then be coded.
Also, the step of coding the scale factors includes the steps of obtaining the maximum scale factor, obtaining the difference between the maximum scale factor and the first scale factors and arithmetic-coding the difference, and obtaining differences between the immediately previous arithmetic-coded scale factor and the respective scale factors subsequent to the first scale factor, mapping the differences into a predetermined value and arithmetic-coding the mapped values.
The step of coding the scale factors includes the steps of obtaining the maximum scale factor, and obtaining differences between the maximum scale factor and the respective scale factors and arithmetic-coding the differences.
The header information commonly used for all bands is coded and the side information and the quantized frequencies necessary for the respective layer are formed by bit-sliced information to then be coded to have a layered structure.
The quantization is performed by the steps of converting the input audio signals of a time domain into signals of a frequency domain, coupling the converted signals as signals of predetermined scale factor bands by time/frequency mapping and calculating a masking threshold at each scale factor band, performing temporal-noise shaping for controlling the temporal shape of the quantization noise within each window for conversion, performing intensity stereo processing such that only the quantized information of a scale factor band for one of two channels is coded, and only the scale factor for the other channel is transmitted, predicting frequency coefficients of the present frame, performing Mid/Side (M/S) stereo processing for converting a left-channel signal and a right-channel signal into an additive signal of two signals and a subtractive signal thereof, and quantizing the signals for each predetermined coding band so that quantization noise of each band is smaller than the masking threshold.
When the quantized data is composed of sign data and magnitude data, the steps of coding of the base layer and enhancement layers and forming bitstreams include the steps of: arithmetic-coding the most significant digit sequences composed of most significant digits of the magnitude data, coding sign data corresponding to non-zero data among the coded most significant digit sequences, coding the most significant digit sequences among uncoded magnitude data of the digital data, coding uncoded sign data among the sign data corresponding to non-zero magnitude data among coded digit sequences, and performing the magnitude coding step and the sign coding step on the respective digits of the digital data, the respective steps being alternately performed on the left-channel data and the right-channel data in units of predetermined vectors.
The scalable stereo audio decoding apparatus further includes an M/S stereo processing portion for performing M/S stereo processing for checking whether or not M/S stereo processing has been performed in the bitstream encoding method, and converting a left-channel signal and a right-channel signal into an additive signal of two signals and a subtractive signal thereof if the M/S stereo processing has been performed, a predicting portion for checking whether or not predicting step has been performed in the bitstream encoding method, and predicting frequency coefficients of the current frame if the checking step has been performed, an intensity stereo processing portion for checking whether or not intensity stereo processing has been performed in the bitstream encoding method, and, if the intensity stereo processing has been performed, then since only the quantized information of the scale factor band for one channel (the left channel) two channels is coded, performing the intensity stereo processing for restoring the quantized information of the other channel (the right channel) into a left channel value, and a temporal noise shaping (TNS) portion for checking whether or not temporal noise shaping step has been performed in the bitstream encoding method, and if the TNS step has been performed, performing temporal-noise shaping for controlling the temporal shape of the quantization noise within each window for conversion.
According to another aspect of the present invention, there is provided a scalable stereo audio coding apparatus including a quantizing portion for signal-processing input audio signals and quantizing the same for each coding band, a bit-sliced arithmetic-coding portion for coding bitstreams for all layers so as to have a layered structure, by band-limiting for a base layer so as to be scalable, coding side information corresponding to the base layer, coding the quantized information sequentially from the most significant bit sequence to the least significant bit sequence, and from lower frequency components to higher frequency components, alternately coding left-channel data and right-channel data in units of predetermined vectors, and coding side information corresponding to the next enhancement layer of the base layer and the quantized data, and a bitstream forming portion for collecting data formed in the quantizing portion and the bit-sliced arithmetic coding portion and generating bitstreams.
The quantizing portion includes a time/frequency mapping portion for converting the input audio signals of a temporal domain into signals of a frequency domain, a psychoacoustic portion for coupling the converted signals by signals of predetermined scale factor bands by time/frequency mapping and calculating a masking threshold at each scale factor band using a masking phenomenon generated by interaction of the respective signals, and a quantizing portion for quantizing the signals for each predetermined coding band while the quantization noise of each band is compared with the masking threshold. Also, the apparatus further includes a temporal noise shaping (TNS) portion for performing temporal-noise shaping for controlling the temporal shape of the quantization noise within each window for conversion, an intensity stereo processing portion for performing intensity stereo processing such that only the quantized information of a scale factor band for one of two channels is coded, and only the scale factor for the other channel is transmitted, a predicting portion for predicting frequency coefficients of the present frame, and an M/S stereo processing portion for performing M/S stereo processing for converting a left-channel signal and a right-channel signal into an additive signal of two signals and a subtractive signal thereof.
According to still another aspect of the present invention, there is provided a scalable stereo audio decoding method for decoding audio data coded to have layered bitrates, including the steps of analyzing data necessary for the respective modules in the bitstreams having a layered structure, decoding at least scale factors and arithmetic-coding model indices and quantized data, in the order of creation of the layers in bitstreams having a layered structure, the quantized data decoded alternately for the respective channels by analyzing the significance of bits composing the bitstreams, from upper significant bits to lower significant bits, restoring the decoded scale factors and quantized data into signals having the original magnitudes, and converting inversely quantized signals into signals of a temporal domain.
The scalable stereo audio decoding method further includes the steps of performing M/S stereo processing for checking whether or not M/S stereo processing has been performed in the bitstream encoding method, and converting a left-channel signal and a right-channel signal into an additive signal of two signals and a subtractive signal thereof if the M/S stereo processing has been performed, checking whether or not a predicting step has been performed in the bitstream encoding method, and predicting frequency coefficients of the current frame if the checking step has been performed, checking whether or not an intensity stereo processing step has been performed in the bitstream encoding method, and, if the intensity stereo processing has been performed, then since only the quantized information of the scale factor band for one channel (the left channel) two channels is coded, performing the intensity stereo processing for restoring the quantized information of the other channel (the right channel) into a left channel value, and checking whether or not a temporal noise shaping (TNS) step has been performed in the bitstream encoding method, and if the TNS step has been performed, performing temporal-noise shaping for controlling the temporal shape of the quantization noise within each window for conversion.
When the quantized data is composed of sign data and magnitude data, restoring quantized frequency components by sequentially decoding the magnitude data of quantized frequency components sign bits and coupling the magnitude data and sign bits.
The decoding step is performed from the most significant bits to the lowest significant bits and the restoring step is performed by coupling the decoded bit-sliced data and restoring the coupled data into quantized frequency component data.
The data is decoded in the decoding step such that bit-sliced information of four samples is decoded into units of four-dimensional vectors.
The four-dimensional vector decoding is performed such that two subvectors coded according to prestates indicating whether non-zero bit-sliced frequency components are coded or not is arithmetic-decoded, and the two subvectors decoded according to the coding states of the respective samples are restored into four-dimensional vectors.
Also, while the bit-sliced data of the respective frequency components is decoded from the MSBs, decoding is skipped if the bit-sliced data is xe2x80x980xe2x80x99 and sign data is arithmetic-decoded when the bit-sliced data xe2x80x981xe2x80x99 appears for the first time. The decoding of the scale factors is performed by decoding the maximum scale factor in the bitstream, arithmetic-decoding differences between the maximum scale factor and the respective scale factors, and subtracting the differences from the maximum scale factor. Also, the step of decoding the scale factors includes the steps of decoding the maximum scale factor from the bitstreams, obtaining differences between the maximum scale factor and scale factors to be decoded by mapping and arithmetic-decoding the differences and inversely mapping the differences from the mapped values, and obtaining the first scale factor by subtracting the differences from the maximum scale factor, and obtaining the scale factors for the remaining bands by subtracting the differences from the previous scale factors.
The decoding of the arithmetic-coded model indices is performed by the steps of decoding the minimum arithmetic model index in the bitstream, decoding differences between the minimum index and the respective indices in the side information of the respective layers, and adding the minimum index and the differences.
Alternatively, according to the present invention, there is provided a scalable stereo audio decoding apparatus for decoding audio data coded to have layered bitrates, including a bitstream analyzing portion for analyzing data necessary for the respective modules in the bitstreams having a layered structure, a decoding portion for decoding at least scale factors and arithmetic-coding model indices and quantized data, in the order of creation of the layers in bitstreams having a layered structure, the quantized data decoded alternately for the respective channels by analyzing the significance of bits composing the bitstreams, from upper significant bits to lower significant bits, a restoring portion for restoring the decoded scale factors and quantized data into signals having the original magnitudes, and a frequency/time mapping portion for converting inversely quantized signals into signals of a temporal domain.
The apparatus further includes an M/S stereo processing portion for performing M/S stereo processing for checking whether or not M/S stereo processing has been performed in the bitstream encoding method, and converting a left-channel signal and a right-channel signal into an additive signal of two signals and a subtractive signal thereof if the M/S stereo processing has been performed, a predicting portion for checking whether or not predicting step has been performed in the bitstream encoding method, and predicting frequency coefficients of the current frame if the checking step has been performed, an intensity stereo processing portion for checking whether or not intensity stereo processing has been performed in the bitstream encoding method, and, if the intensity stereo processing has been performed, then since only the quantized information of the scale factor band for one channel (the left channel) two channels is coded, performing the intensity stereo processing for restoring the quantized information of the other channel (the right channel) into a left channel value, and a temporal noise shaping portion for checking whether or not temporal noise shaping (TNS) step has been performed in the bitstream encoding method, and if the TNS step has been performed, performing temporal-noise shaping for controlling the temporal shape of the quantization noise within each window for conversion.
Further, the present invention may be written by a program executable in a computer. Also, the invention may be embodied in a general purpose digital computer that is running a program from a computer usable medium, including but not limited to storage media such as magnetic storage media (e.g., floppy disks, hard disks, etc.), optically readable media (e.g., CD-ROMs, DVDs, etc.) and carrier waves (e.g., transmissions over the Internet).
For instance, there is provided a computer usable medium, tangibly embodying a program of instructions executable by a machine to perform a scalable audio coding method for coding audio signals into a layered datastream having a base layer and enhancement layers of a predetermined number, the method including the steps of signal-processing input audio signals and quantizing the same for each predetermined coding band, and coding the quantized data corresponding to the base layer, coding the quantized data corresponding to the next enhancement layer of the coded base layer and the remaining quantized data uncoded due to a layer size limit and belonging to the coded layer, and sequentially performing the layer coding steps for all enhancement layers to form bitstreams, wherein the base layer coding step, the enhancement layer coding step and the sequential coding step are performed such that the side information and quantized data corresponding to a layer to be coded are represented by digits of a predetermined same number, and then arithmetic-coded using a predetermined probability model in the order ranging from the MSB sequences to the LSB sequences, while the bit-sliced left-channel data and the right-channel data are alternately coded into units of predetermined vectors.