1. Field of the Invention
The present invention relates to audio coding and decoding, and more particularly, to a scalable audio coding/decoding method and apparatus, for coding/decoding layered bitstreams, by representing data of various enhancement layers, based on a base layer, within a bitstream. This invention has been adopted in ISO/IEC JTC1/SC29/WG11 N1903 (ISO/IEC 14496-3 SUBPART 4 Committee Draft.
2. Description of the Related Art
In general, a waveform including information is basically a continuous analog signal. To express the waveform as a discrete signal, analog-to-digital (A/D) conversion is necessary.
To perform the A/D conversion, two procedures are necessary: 1) a sampling procedure for converting a temporally continuous signal into a discrete signal; and 2) an amplitude quantizing procedure for limiting the number of possible amplitudes to a limited value, that is to say, for converting input amplitude x(n) into an element y(n) belonging to a finite set of possible amplitudes at time n.
By the recent development of digital signal processing technology, an audio signal storage/restoration method for converting an analog signal into digital PCM (Pulse Code Modulation) data through sampling and quantization, storing the converted signal in a recording/storage medium such as a compact disc or digital audio tape and then reproducing the stored signal upon a user's need, has been proposed and widely used. The digital storage/restoration method solves deterioration in audio quality and considerably improves the audio quality, in contrast to the conventional analog method. However, this method still has a problem in storing and transmitting data in the case of a large amount of digital data.
To reduce the amount of the digital data, DPCM (Differential Pulse Code Modulation) or ADPCM (Adaptive Differential Pulse Code Modulation) for compressing digital audio signal has been developed. However, such methods have a disadvantage in that a big difference in efficiency is generated according to signal types. An MPEG (Moving Picture Expert Group)/audio technique recently standardized by the ISO (International Standard Organization), and AC-2/AC-3 techniques developed by Dolby, use a human psychoacoustic model to reduce the quantity of data.
In the conventional audio signal compressing method such as MPEG-1/audio, MPEG-2/audio or AC-2/AC-3, signals of a temporal domain are converted into signals of a frequency domain by being coupled by blocks having constant magnitude. Then, the converted signals are scala-quantized using the human psychoacoustic model. The quantizing technique is simple but is not optimal even if input samples are statistically independent. Further, if input samples are statistically dependent from one another, the quantization is more inappropriate. Thus, coding is performed, including lossless coding such as entropy coding or a certain kind of adaptive quantization. Consequently, the coding procedure becomes very complex, compared to the simple PCM data storing method. A bitstream includes side information for compressing signals as well as the quantized PCM data.
The MPEG/audio standards or AC-2/AC-3 method provide almost the same audio quality as that for a compact disc, with a bitrate of 64.about.384 Kbps which is one-sixth to one-eighth that in the conventional digital coding. For this reason, MPEG/audio standards play an important role in storing and transmitting audio signals as in digital audio broadcasting (DAB), Internet phone, or audio on demand (AOD).
In the conventional techniques, a fixed bitrate is given in an encoder, and the optimal state suitable for the given bitrate is searched to then perform quantization and coding, thereby exhibiting considerably better efficiency. However, with the advent of multimedia technology, there are increasing demands for coder/decoder (codec) having versatile functions with low bitrate coding efficiency. One of such demands is a scalable audio codec. The scalable audio codec can make the bitstreams coded at a high bitrate into low bitrate bitstreams to then restore only some of them. By doing so, signals can be restored with a reasonable efficiency with only some of the bitstreams, exhibiting a slight deterioration in performance a little due to lowered bitrates, when an overload is applied to the network or the performance of a decoder is poor, or by a user's request.
According to general audio coding techniques, a fixed bitrate is given to a coding apparatus, the optimal state for the given bitrate is searched to then perform quantization and coding, thereby forming bitstreams in accordance with the bitrate. One bitstream contains information only for one bitrate. In other words, bitrate information is contained in the header of a bitstream and a fixed bitrate is used. Thus, a method exhibiting the best efficiency at a specific bitrate can be used. For example, when a bitstream is formed by an encoder at a bitrate of 96 Kbps, the best quality sound can be restored by a decoder corresponding to the encoder having a bitrate of 96 Kbps.
According to such methods, bitstreams are formed without consideration of other bitrates, but bitstreams having a magnitude suitable for a given bitrate, rather than the order of the bitstreams, are formed. Actually, if the thus-formed bitstreams are transmitted via a communications network, the bitstreams are sliced into several slots to then be transmitted. When an overload is applied to a transmission channel, or only some of the slots sent from a transmission end are received at a reception end due to a narrow bandwidth of the transmission channel, data cannot be reconstructed properly. Also, since bitstreams are not formed according to the significance thereof, if only some of the bitstreams are restored, the quality is severely degraded. In the case of audio digital data, sound objectionable to the ear is reproduced.
For example, when a broadcasting station forms bitstreams and transmits the same to various users, different bitrates may be requested by the users. The users may have decoders of different efficiencies. In such a case, if only the bitstreams supported by a fixed bitrate are transmitted from the broadcasting station to meet the users' request, the bitstreams must be transmitted separately to the respective users, which is costly in transmission and formation of bitstreams.
However, if an audio bitstream has bitrates of various layers, it is possible to meet the various users' request and given environment appropriately. To this end, simply, coding is performed on lower layers, as shown in FIG. 1, and then decoding is performed. Then, a difference between the signal obtained by decoding and the original signal is again input to an encoder for the next layer to then be processed. In other words, the base layer is coded first to generate a bitstream and then a difference between the original signal and the coded signal is coded to generate a bitstream of the next layer, which is repeated. This method increases the complexity of the encoder. Further, to restore the original signal, a decoder also repeats this procedure in a reverse order, which increases the complexity of the decoder. Thus, as the number of layers increases, the encoder and decoder become more complex.