In recent years, a variety of audio compression methods have been developed. MPEG-2 Advanced Audio Coding (MPEG-2 AAC) is one of such compression methods, and is defined in detail in “ISO/IEC 13818-7 (MPEG-2 Advanced Audio Coding, AAC)”.
The following describes conventional encoding and decoding procedures with reference to FIG. 1. FIG. 1 is a block diagram showing a conventional encoding device 300 and a conventional decoding device 400 conforming to MPEG-2 AAC. The encoding device 300 receives and encodes an audio signal in accordance with MPEG-2 AAC, and comprises an audio signal input unit 310, a transforming unit 320, a quantizing unit 331, an encoding unit 332, and a stream output unit 340.
The audio signal input unit 310 receives digital audio data that has been generated as a result of sampling at a 44.1-kHz sampling frequency. From this digital audio data, the audio signal input unit 310 extracts 1,024 consecutive samples. Such 1,024 samples are a unit of encoding and are called a frame.
The transforming unit 320 transforms the extracted samples (hereafter called “sampled data”) in the time domain into spectral data composed of 1,024 samples in the frequency domain in accordance with Modified Discrete Cosine Transform (MDCT). This spectral data is then divided into a plurality of groups, each of which contains at least one sample and simulates a critical band of human hearing. Each such group is called a “scale factor band”.
The quantizing unit 331 receives the spectral data from the transforming unit 320, and quantizes it with a normalizing factor corresponding to each scale factor band. This normalizing factor is called a “scale factor”, and each set of spectral data quantized with the scale factor is hereafter called “quantized data”.
In accordance with Huffman coding, the encoding unit 332 encodes the quantized data and each scale factor used for the quantized data. Before encoding scale factors, the encoding unit 332 specifies, for every scale factor, a difference in values of two scale factors in two consecutive scale factor bands. The encoding unit 332 then encodes each specified difference and a scale factor used in a scale factor band at the start of the frame.
The stream output unit 340 receives the encoded signal from the encoding unit 332, transforms it into an MPEG-2 AAC bit stream and outputs it. This bit stream is either transmitted to the decoding device 400 via a transmission medium, or recorded on a recording medium, such as an optical disc including a compact disc (CD) and a digital versatile disc (DVD), a semiconductor, and a hard disk.
The decoding device 400 decodes this bit stream encoded by the encoding device 300, and includes a stream input unit 410, a decoding unit 421, a dequantizing unit 422, an inverse-transforming unit 430, and an audio signal output unit 440.
The stream input unit 410 receives the MPEG-2 AAC bit stream encoded by the encoding device 300 via a transmission medium, or reconstructs the bit stream from a recording medium. The stream input unit 410 then extracts the encoded signal from the bit stream.
The decoding unit 421 decodes the extracted encoded signal that has the format for the stream so that quantized data is produced.
The dequantizing unit 422 dequantizes the quantized data (which is Huffman-encoded when MPEG-2 AAC is used) to produce spectral data in the frequency domain.
The inverse-transforming unit 430 transforms the spectral data into the sampled data in the time domain. For MPEG-2 AAC, this conversion is performed based on Inverse Modified Discrete Cosine Transform (IMDCT).
The audio signal output unit 440 combines sets of sampled data outputted from the inverse-transforming unit 430, and outputs it as digital audio data.
In MPEG-2 AAC, the length of the sampled data subject to MDCT conversion can be changed in accordance with an inputted audio signal. When sampled data for which MDCT is to be performed is composed of 256 samples, this sampled data is based on short blocks. When sampled data for which MDCT is to be performed is composed of 2,048 samples, the sampled data is based on long blocks. The short and long blocks represent a block size.
When digital audio data is sampled at the 44.1-kHz sampling frequency and a short block is applied, the encoding device 300 extracts, from the sampled audio data, 128 samples together with two sets of 64 samples obtained immediately before and after the 128 samples, that is, 256 samples in total. These two sets of 64 samples overlap with other two sets of 128 samples that are extracted immediately before and after the present 128 samples. The extracted audio data is transformed based on MDCT into spectral data composed of 256 samples, out of which only half, that is, 128 samples are quantized and encoded. Eight consecutive windows that each include spectral data composed of 128 samples are regarded as a frame composed of 1,024 samples, and this frame is a unit subject to the subsequent processing including quantizing and encoding.
In this way, a window based on a short block includes 128 samples while a window based on a long block includes 1,024 samples. When audio data of a 22.05-kHz reproduction band represented by short blocks is compared with the same audio data represented by long blocks, audio data represented by short blocks has a better time resolution even for an audio signal based on short cycles, although audio data represented by long blocks achieves better sound quality because more samples are used to represent the same audio data. That is to say, if an extracted audio signal within a window contains an attack (a high-amplitude spike pulse), its damage is more extensive in long blocks than in short blocks because the attack affects as many as 1,024 samples within a window based on long bocks. With the short blocks, however, damage of the attack is confined within one window composed of 128 samples and spectrums in other windows are not susceptible to the attack, which allows more accurate reproduction of original sound.
The quality of audio data encoded by the encoding device 300 and sent to the decoding device 400 can be measured, for instance, by a reproduction band of the encoded audio data. When an input signal is sampled at the 44.1-kHz sampling frequency, for instance, a reproduction band of this signal is 22.05 kHz. When the audio signal with the 22.05-kHz reproduction band or wider reproduction band close to 22.05 kHz is encoded into encoded audio data without degradation, and all the encoded audio data is transmitted to the decoding device, then this audio data can be reproduced as high-quality sound. The width of a reproduction band, however, affects the number of values of spectral data, which in turn affects the amount of data for transmission. For instance, when an input audio signal is sampled at the sampling frequency of 44.1 kHz, spectral data generated from this signal is composed of 1,024 samples, which has the 22.05-kHz reproduction band. In order to secure the 22.05-kHz reproduction band, all the 1,024 samples of the spectral data needs to be transmitted. This requires efficient encoding of an audio signal so as to restrict a bit amount of the encoded audio signal to a range of a transfer rate of a transmission channel.
It is not realistic to transmit as many as 1,024 samples of the spectral data via a low-rate transmission channel of, for instance, a portable phone. This is to say, when all the spectral data with a wide reproduction band is transmitted at such low transfer rate while the bit amount of the entire spectral data is adjusted for the low transfer rate, amounts of bits of data assigned to each frequency band becomes extremely small. This intensifies the effect of quantization noise, so that sound quality decreases after encoding.
In order to prevent such degradation, efficient audio signal transmission is achieved in many of audio signal encoding methods, including MPEG-2 AAC, according to which appropriate weights are assigned to each set of the spectral data, and low-weighted values are not transmitted. With this method, a sufficient bit amount is assigned to spectral data in a low frequency band, which is important for human hearing, to enhance its encoding accuracy, while spectral data in a high frequency band is regarded as less important and is often not transmitted.
Although such techniques are used in MPEG-2 AAC, audio encoding technology that achieves reproduction at higher quality and higher compression efficiency is now required. In other words, there is an increasing demand for technology of transmitting an audio signal in both high and low frequency bands at a low transfer rate.