Generally, in order to reduce an amount of information of an audio signal converted into a digital signal, an encoding processing is performed on the audio signal. Examples of an audio encoding method include MPEG-2 AAC (Moving Picture Experts Group-2/4 Advanced Audio Coding), MPEG-4 AAC, MPEG-2 HE-AAC (High Efficiency-AAC), MPEG-4 HE-AAC, MPEG2 HE-AAC-version2, MPEG Surround, and MPEG-4 BSAC (Bit Sliced Arithmetic Coding).
In the audio encoding method such as the MPEG-2 AAC, an audio signal in a time domain is converted into an audio signal in a frequency domain, the audio signal in the frequency domain is quantized, and the quantized audio signal is encoded whereby a bit stream is generated. An error (quantization error) caused by the quantization of the audio signal in the frequency domain causes noise when the audio signal is decoded and reproduced resulting in deterioration of audio quality.
Especially, when the audio signal is abruptly changed due to a generation of large sound, for example, a quantization error generated in a portion in which the abrupt change occurs affects entire blocks which have been subjected to the quantization resulting in a generation of noise.
Human beings have a hearing characteristic in which it is difficult to catch sound immediately before and immediately after large sound is generated. This hearing characteristic is referred to as a “masking effect”. Although a period of time in which sound is not caught after large sound is generated varies among different individuals, it is approximately 100 milliseconds. On the other hand, a period of time in which the masking effect remains before the large sound is generated is small, e.g., approximately five to six milliseconds. Therefore, noise generated before the large sound is generated is likely to be detected since the period of time in which the masking effect remains is small. A phenomenon in which noise is generated before large sound is generated is referred to as a “pre-echo”.
In general, in the MPEG-2 AAC, encoding and decoding are performed with a conversion block length of 1024 samples. For example, in a case of a sampling frequency of 48 kHz, a time length of a conversion block is approximately 21 milliseconds obtained in accordance with the following expression: 1/48000×1024. That is, the time length is smaller than the period of time in which the masking effect remains after large sound is generated, i.e., approximately 100 milliseconds. Since influence of the quantization error caused by an abrupt change of the audio signal is trapped in the conversion block, when the encoding is performed using the block length of 1024 samples, the noise caused by the quantization error is not detected by human beings due to the masking effect, which is tolerated.
However, since the period of time in which the masking effect remains before the large sound is generated is small, i.e., approximately five to six milliseconds, when the encoding is performed with the conversion block length of 1024 samples, the period of time in which noise caused by the quantization error is generated before the large sound is generated may be larger than the period of time in which the masking effect remains. If the period of time in which noise caused by the quantization error is generated before the large sound is generated is larger than the period of time in which the masking effect remains, the human beings detect the pre-echo.
In the audio encoding method, a generation of the pre-echo is prevented by detecting an abrupt change of an input signal and making the conversion block length smaller.
For example, in the MPEG-2 AAC, when an abrupt change of an audio signal caused by large sound is not included in a frame, encoding is performed with a conversion block length of 1024 samples. A block having a conversion block length of 1024 samples is referred to as a “long block”. Furthermore, when an abrupt change of an audio signal caused by large sound is included in a frame, encoding is performed with a conversion block length of 128 samples. A block having a conversion block length of 128 samples is referred to as a “short block”.
When the audio signal is encoded in a unit of a short block, the influence of the quantization error caused by the abrupt change is trapped in the short block. In the case of a sampling frequency of 48 kHz, a time length of the short block is approximately 2.7 milliseconds obtained in accordance with the following expression: 1/48000×128. The time length of the short block is smaller than the period of time in which the masking effect remains before the audio signal is abruptly changed, i.e., approximately five to six milliseconds. Therefore, when the frame includes the abrupt change of the audio signal, the influence of the quantization error can be trapped within the period of time in which the masking effect remains by performing the encoding in a unit of a short block. Accordingly, noise detected by the human beings is negligible, and consequently, the pre-echo is not generated.
Such a quantization performed in a unit of a short block when the audio signal is abruptly changed is employed, in addition to the MPEG-2 AAC, in the MPEG-4 AAC, the MPEG-2 HE-AAC, the MPEG-4 HE-AAC, the MPEG2 HE-AAC-version2, the MPEG Surround, and the MPEG-4 BSAC.
Furthermore, in the audio encoding method in which the block length is changed as described above, a plurality of consecutive short blocks included in a frame are grouped so that the group is used as a unit of encoding. When the plurality of short blocks are grouped, auxiliary information on audio signals is shared. Accordingly, when compared with a case where audio signals included in short blocks are encoded for individual short blocks, an amount of the auxiliary information included in one frame is reduced.
When an abrupt change of an audio signal is detected in an audio frame, short blocks are grouped using the abrupt change as a reference. The abrupt change of an audio signal is referred to as an “attack” hereinafter.