As a technology to encode a general audio signal to be represented by a small amount of bits for acquiring a high-quality reproduction signal, there is a well-known method utilizing band division encoding. Such method is realized by dividing an inputted audio signal into a signal having a plurality of frequency bands through band division filter, or by transforming an inputted audio signal into a signal in the frequency domain through a time-frequency transform such as Fourier transform and dividing the resulting signal into a plurality of bands in the frequency domain, which is followed by the allocation of appropriate encoding bits to each band. The reason why band division encoding provides a high-quality reproduction signal is that processing on the basis of human auditory characteristics is performed in such method at the encoding stage. Generally, the human sense of hearing is less sensitive to a high-frequency sound at around 10 kHz and to a sound at a low level. Furthermore, there is a well-known phenomenon called frequency masking, due to which it is difficult for a person, when there is a high level sound in a certain frequency band, to perceive a lower level sound in the proximity of such frequency band. Concerning such sounds that are hard to be perceived because of the human auditory characteristics, it results in little contribution to the improvement in the quality of a reproduction signal even if encoding is performed by allocating a large number of bits and therefore there is no point of performing such encoding. This means, however, that it is possible to improve the quality of a reproduction signal by allocating encoding bits which are allocated to the parts a person cannot perceive well without taking into account the human auditory characteristics, to the parts to which the human hearing is sensitive, and then by performing a detailed encoding for such parts.
As a representative example of encoding utilizing the above-mentioned band division, exists ISO standard MPEG-4 AAC (ISO/IEC 14496-3). The following explains the operation of the MPEG-4 AAC (to be referred to as “AAC” hereinafter) with reference to figures.
FIG. 1 is a block diagram showing the configuration of an encoding device 100 in accordance with the conventional AAC system. The encoding device 100 is an encoding device for evaluating an input signal 109 on the basis of the human auditory characteristics so as to encode such input signal 109 by allocating the amount of bits according to the result of such evaluation. Such encoding device 100 is comprised of an auditory characteristics evaluating unit 101, a transform block length selecting unit 102, an MDCT transforming unit 103, a band dividing unit 104, a spectral signal processing unit 105, a bit allocating unit 106, a quantizing unit 107, and a code multiplexing unit 108. The input signal 109 is divided in units of 1024 samples which is the basic frame length per frame and then is inputted to the auditory characteristics evaluating unit 101 and the MDCT transforming unit 103. The auditory characteristics evaluating unit 101 evaluates the input signal 109 according to the human auditory characteristics and outputs an auditory characteristics evaluated value 110. The transform block length selecting unit 102 selects a transform block length suited to encode the input signal 109 according to the auditory characteristics evaluated value 110 and outputs the selected transform block length to the MDCT transforming unit 103. Then the MDCT transforming unit 103 transforms the input signal 109 into MDCT coefficients 111 with such selected transform block length. In the case of the AAC, a transform block length is 128 samples or 1024 samples. The shorter transform block length is provided if the input signal 109 is a transient signal, while the longer transform block length is provided if the input signal 109 is a stationary signal.
MDCT (Modified Discrete Cosine Transform) employed here is a kind of cosine transform, and the determined MDCT coefficients 111 serve as coefficients representing the frequency spectrum of the input signal 109. The determined MDCT coefficients 111 are divided into a plurality of frequency bands (sub-bands) by the band dividing unit 104. Then, for MDCT coefficients 112 divided into each frequency band, the spectral signal processing unit 105 makes such predictions as contribute to encoding of an increased efficiency and performs noise shaping on the basis of the auditory characteristics evaluated value 110. Moreover, if the input signal 109 is such signal as a stereo signal made up a plurality of channels, the spectral signal processing unit 105 performs processing called joint stereo which enhances the efficiency of encoding by utilizing an inter-channel correlation of signals. Furthermore, there is a case where processing called PNS (Perceptual Noise Substitution) is carried out, a detailed explanation of which is given later.
Meanwhile, information concerning what kind of processing is performed in the spectral signal processing unit 105 is outputted as an auxiliary information code 114. The bit allocating unit 106 determines bit allocation 115 required for quantization and outputs such bit allocation 115 to the quantizing unit 107. The quantizing unit 107 quantizes MDCT coefficients 113 for which processing is performed in the spectral signal processing unit 105 with the number of bits indicated by the bit allocation 115. Quantization is performed for a combination of normalized gain information of each sub-band called the scale factor and values of MDCT coefficients normalized by the scale factor. The code multiplexing unit 108 multiplexes the auxiliary information code 114 outputted from the spectral signal processing unit 105 with a spectral code 116 outputted from the quantizing unit 107 and then puts the resulting code into a specified format to output it as an output code 117. Note that in the case of the AAC, since the number of bits to be allocated for the basic frame can be arbitrarily determined on a per-frame basis, encoding is basically performed at a variable bit rate. However, by providing a buffer called bit reservoir before the final output processing so as to have such buffer absorb bit rate variations of each frame, it is possible for a signal to be transmitted at a fixed bit rate.
Next, an explanation is provided for PNS processing. In PNS, it is judged whether or not the above-mentioned each sub-band has noise characteristics in terms of the auditory sense. When judged to have noise characteristics, MDCT coefficients in such band are substituted with a noise signal to be generated randomly. Since there is no need for quantizing values of MDCT coefficient in a band substituted with a noise signal and therefore only the gain information corresponding to the scale factor needs to be quantized, it is possible to make a significant reduction in the number of encoding bits required for quantization.
Through such encoding processing, the AAC enables a high-quality encoding of a stereo signal in a wide band at 20 Hz 16 kHz or over at around 96 kbps, for instance.
However, when the bit rate is further lowered to around 48 kbps, for example, there occurs a problem that the bandwidth of a stereo signal for which high-quality encoding is possible becomes narrower, resulting in a muffled sound from an audibility standpoint.
Furthermore, too frequent use of PNS for the sake of reducing the number of encoding bits of MDCT coefficients at the stage of lowering the bit rate leads to an increase of parts to be substituted with a noise signal, and therefore the resulting sound is with much noise and distortion even to human ears.
In view of the above-mentioned problems, the present invention aims at providing an encoding device, a decoding device, an encoding method, and a decoding method which allow high-quality decoding of an audio signal in a wide bandwidth on the part of the decoding device which receives a code, when transmitting the code of such audio signal encoded in the encoding device at a low bit rate.