1. Field of the Invention
The present invention relates to audio coding and decoding, and more particularly, to a scalable audio coding/decoding method and apparatus, for coding/decoding layered bitstreams this invention has been adopted in ISO/IEC JTC1/SC29/WG11 N1903 (ISO/IEC 14496-3 SUBPART 4 Committee Draft), by representing data of various enhancement layers, based on a base layer, within a bitstream.
2. Description of the Related Art
In general, a waveform including information is basically a continuous analog signal. To express the waveform as a discrete signal, analog-to-digital (A/D) conversion is necessary.
To perform the A/D conversion, two procedures are necessary: 1) a sampling procedure for converting a temporally continuous signal into a discrete signal; and 2) an amplitude quantizing procedure for limiting the number oaf possible amplitudes to a limited value, that is to say, for converting input amplitude x(n) into an element y(n) belonging to a finite set of possible amplitudes at time n.
By the recent development of digital signal processing technology, an audio signal storage/restoration method for converting an analog signal into digital PCM (Pulse Code Modulation) data through sampling and quantization, storing the converted signal in a recording/storage medium such as a compact disc or digital audio tape and then reproducing the stored signal upon a user""s need, has been proposed and widely used. The digital storage/restoration method solves deterioration in audio quality and considerably improves the audio quality, in contrast to the conventional analog method. However, this method still has a problem in storing and transmitting data in the case of a large amount of digital data.
To reduce the amount of the digital data, DPCM (Differential Pulse Code Modulation) or ADPCM (Adaptive Differential Pulse Code Modulation) for compressing digital audio signal has been developed. However, such a method have a disadvantage in that a big difference in efficiency is generated according to signal types. An MPEG (Moving Picture Expert Group)/audio technique recently standardized by the ISO (International Standard Organization), and AC-2/AC-3 techniques developed by Dolby, use a human psychoacoustic model to reduce the quantity of data.
In the conventional audio signal compressing method such as MPEG-1/audio, MPEG-2/audio or AC-2/AC-3, signals of a temporal domain are converted into signals of a frequency domain by being coupled by blocks having constant magnitude. Then, the converted signals are scala-quantized using the human psychoacoustic model. The quantizing technique is simple but is not optimal even if input samples are statistically independent. Further, if input samples are dependant statistically denpendent from one another, the quantization is more inappropriate. Thus, coding is performed, including lossless coding such as entropy coding or a certain kind of adaptive quantization. Therefore, consequently the coding procedure becomes very complex, compared to the simple PCM data storing method. A bitstream includes side information for compressing signals as well as the quantized PCM data.
The MPEG/audio standards or AC-2/AC-3 method provide almost the same audio quality as that for a compact disc, with a bitrate of 64xcx9c384 Kbps which is one-sixth to one-eighth that in the conventional digital coding. By for this reason, MPEG/audio standards play an important role in storing and transmitting audio signals as in digital audio broadcasting (DAB), Internet phone, or audio on demand (AOD).
In the conventional techniques, a fixed bitrate is given in an encoder, and the optimal state suitable for the given bitrate is searched to then perform quantization and coding, thereby exhibiting considerably better efficiency. However, with the advent of multimedia technology, there are increasing demands for coder/decoder (codec) having versatile functions with low bitrate coding efficiency. One of such demands is a scalable audio codec. The scalable audio codec can make the bitstreams coded at a high bitrate into low bitrate bitstreams to then restore only some of them. By doing so, signals can be restored with a reasonable efficiency with only some of the bitstreams, exhibiting a slight deterioration in performance a little due to lowered bitrates, when an overload is applied to the network or the performance of a decoder is poor, or by a user""s request.
According to general audio coding techniques, a fixed bitrate is given to a coding apparatus, the optimal state for the given bitrate is searched to then perform quantization and coding, thereby forming bitstreams in accordance with the bitrate. One bitstream contains information only for one bitrate. In other words, bitrate information is contained in the header of a bitstream and a fixed bitrate is used. Thus, a method exhibiting the best efficiency at a specific bitrate can be ;used. For example, when a bitstream is formed by an encoder at a bitrate of 96 Kbps, the best quality sound can be restored by a decoder corresponding to the encoder having a bitrate of 96 Kbps.
According to such methods, bitstreams are formed without consideration of other bitrates, but bitstreams having a magnitude suitable for a given bitrate, rather than the order of the bitstreams, are formed. Actually, if the thus-formed bitstreams are transmitted via a communications network, the bitstreams are sliced into several slots to then be transmitted. When an overload-is applied to a transmission channel, or only some of the slots sent from a transmission end are received at a reception end due to a narrow bandwidth of the transmission channel, data cannot be reconstructed properly. Also, since bitstreams are not formed according to the significance thereof, if only some of the bitstreams are restored, the quality is severely degraded. In the case of audio digital data, sound objectionable to the ear is reproduced.
For example, when a broadcasting station forms bitstreams and transmits the same to various users, different bitrates may be requested by the users. The users may have decoders of different efficiencies. In such a case, if only the bitstreams supported by a fixed bitrate are transmitted from the broadcasting station to meet the users"" request, the bitstreams be transmitted separately to the respective users, which is costly much in transmission and formation of bitstreams.
However, if an audio bitstream has bitrates of various layers, it is possible to meet the various users"" request and given environment appropriately. To this end, simply, coding is performed on lower layers, as shown in FIG. 1, and then decoding is performed. Then, a difference between the signal obtained by decoding and the original signal is again input to an encoder for the next layer to When be processed. In other words, the base layer is coded first to generate a bitstream and then a difference between the original signal and the coded signal is coded to generate a bitstream of the next layer, which is repeated. This method increases the complexity of the encoder. Further, to restore the original signal, a decoder also repeats this procedure in a reverse order, which increases the complexity of the decoder. Thus, as the number of layers increases, the encoder and decoder become more complex.
To solve the above problems, it is an object of the present invention to provide a scalable audio coding/decoding method and apparatus, which can control the magnitude of bitstreams and the complexity of a decoder, according to the state of a transmission channel, the performance of the decoder or a user""s request, by representing data for bitrates of various layers in a bitstream.
To accomplish the object, there is provided a scalable audio coding method for coding audio signals into a layered datastream having a base layer and enhancement layers of a predetermined number, comprising he steps of: (a) signal-processing input audio signals and quantizing the same for each predetermined coding band; (b) coding the quantized data corresponding to the base layer within a predetermined layer size; (c) coding the quantized data corresponding to the next enhancement layer of the coded base layer and the remaining quantized data uncoded and belonging to the enhancement layer, within a predetermined layer size; and (d) sequentially performing the layer coding steps for all layers, wherein the steps (b), (c) and (d) each comprise the steps of: (e) representing the quantized data corresponding to a layer to be coded by digits of a predetermined same number; and (f) coding the most significant digit sequences composed of most significant digits of the magnitude data composing the represented digital data.
The steps (e) and (f) are performed sequentially from low frequency to high frequency.
The coding steps (b), (c) and (d) are performed on side information having at least quantization step size information and quantization bit information allotted to each band, by a predetermined coding method.
The digits in the steps (e) and (f) in one embodiment, and the coding in the step (f) can be performed by coupling bits composing the bit sequences into units of bits of a predetermined in this embodiment.
The predetermined coding method preferably is lossless coding and the lossless coding is Huffman coding preferably or Arithmetic coding.
When the quantized data is composed of sign data and magnitude data, the step (f), preferably comprises the steps of: i) coding the most significant digit sequences composed of most significant digits of the magnitude data composing the represented digital data by a predetermined coding method; ii) coding sign data corresponding to non-zero data among the coded most significant digit sequences; iii) coding the most significant digit sequences among uncoded magnitude data of the digital data by a predetermined coding method; iv) coding uncoded sign data among the sign data corresponding to non-zero magnitude data among digit sequences coded in the step (iii); and v) performing the steps (iii) and (iv) on the respective digits of the digital data.
The step (e) is to represent the digital data as binary data having bits in one embodiment of invention of the same number, and the digits are bits.
The coding steps are performed by coupling bits composing the respective bit sequences for the magnitude data and sign data, into units of bits of a predetermined number in one embodiment of the invention.
The quantization is performed by the steps of converting the input audio signals of a temporal domain into signals of a frequency domain, coupling the converted signals as signals of predetermined subbands by-time/frequency mapping and calculating a masking threshold at each subband, and quantizing the signals for each predetermined coding band so that quantization noise of each band is smaller than the masking threshold.
According to another aspect of the present invention,. there is provided a scalable audio coding apparatus for coding audio signals to have layered bitrate data of a predetermined number, comprising: a quantizing portion for signal-processing input audio signals and quantizing the same for each coding band; and a bit packing portion for generating bitstreams by. coding side information corresponding to a base layer and the quantized data, and coding side information corresponding to the next layer of the base layer and the quantized data, to perform coding on all layers, wherein the bit packing portion performs the coding by representing the quantized data by binary data having bits of a predetermined same number to slice the same into units of bits, and coding the bit-sliced data from the most significant bit sequence to the least significant bit sequence, by a predetermined coding method.
When digital data is composed of sign data and magnitude data, the bit packing portion collects and codes the magnitude data for the bits having the same significance level among the bit-sliced data, codes uncoded sign data among the sign data corresponding to non-zero magnitude data, the magnitude and sign data coding is performed sequentially from the MSBs to lower significant bits in one embodiment.
When the bit packing portion collects and codes the bits according to significance, coding is performed by coupling the bits in a predetermined number unit in one embodiment.
Also, there is provided a scalable audio decoding method for decoding audio data coded to have layered bitrates, comprising the steps of: decoding side information having at least quantization step size information and quantization bit information allotted to each band, in the order of creation of the layers in datastreams having layered bitrates, by analyzing the significance of bits composing the datastreams, from upper significant bits to lower significant bits; restoring the decoded quantization step size and quantized data into signals having the original magnitudes; and converting inversely quantized signals into signals of a temporal domain.
The digits in the decoding step are bits, and the datastreams are bitstreams in this embodiment.
The decoding step according to significance is performed in units of vectors comprised of bits of a predetermined number in the embodiment.
When the quantized data is composed of sign data and magnitude data, the decoding step may be performed by: decoding side information having at least quantization step size information and quantization bit information allotted to each band, and the quantized data, in the order of creation of layers in datastreams having layered bitrates, by analyzing the significance of bits composing the datastreams, from upper significant bits to lower significant bits; and decoding the sign. data of the quantized data and combining the same with the decoded magnitude data.
The decoding in one embodiment step is performed by Arithmetic decoding or Huffman decoding.
Alternatively, according to the present invention, there is provided a scalable audio decoding apparatus for decoding audio data coded to have layered bitrates, comprising: a bitstream analyzing portion for decoding side information having at least quantization step size information and quantization bit information allotted to each band, and the quantized data, in the order of creation of the layers in layered bitstreams, by analyzing the significance of bits composing the bitstreams, from upper significant bits to lower significant bits; an inverse quantizing portion for restoring the decoded quantization step size and quantized data into signals having the original magnitudes; and a frequency/time mapping portion for converting inversely quantized signals into signals of a temporal domain.