The present technology relates to an audio encoder, an audio encoding method and a program, and particularly relates to an audio encoder, an audio encoding method and a program capable of preventing deterioration of sound quality due to encoding when encoding audio signals of a plurality of channels in high efficiency.
Among known techniques for encoding stereo audio signals constituted of audio signals of a plurality of channels are an M/S stereo encoding technique which enhances encoding efficiency by taking advantage of relationship between the channels, an intensity stereo encoding technique, and the like. Hereinafter, the number of the channels of the stereo audio signals is two of a channel for the left and a channel for the right for convenience of explanation, but the same explanation can be applied to the case that the number is three or more.
The M/S stereo encoding generates components of a sum of and a difference between the audio signals of the channels for the right and left constituting the stereo audio signals as encoding results. Accordingly, since the component of the difference is small when the audio signals of the channels for the right and left are similar to each other, encoding efficiency is high. However, since the component of the difference is large when the audio signals of the channels for the right and left are significantly different from each other, it is difficult to attain high encoding efficiency. This can cause quantization noise in quantization after the encoding and thus, artificial noise in decoding.
In the intensity stereo encoding, the encoding is performed based on the principles that human auditory sensation is dull of phases in a high-frequency region, and that positions are sensed mainly based on level ratios between frequency spectra (for example, see ISO/IEC 13818-7 Information technology “Generic coding of moving pictures and associated audio information Part 7”, Advanced Audio Coding (AAC)). Specifically, as for frequencies below a predetermined frequency FIS, the intensity stereo encoding affords frequency spectra of the channels for the right and left as the encoding results as they are. On the other hand, as for frequencies equal to or greater than the predetermined frequency FIS, it generates a common spectrum obtained by mixing the frequency spectra of the channels for the right and left and levels of the frequency spectra of the individual channels as the encoding results.
Accordingly, as for the frequencies below the frequency FIS, a decoder affords the frequency spectra of the channels for the right and left as the encoding results, as decoding results as they are. On the other hand, as for the frequencies equal to or greater than the frequency FIS, it applies the levels of the frequency spectra of the individual channels to the common spectrum as the encoding result to generate the decoding results.
Also for such intensity stereo encoding, the premise is that the audio signals of the channels for the right and left are similar to each other similarly to the case of the M/S stereo encoding. Accordingly, when the audio signals of the channels for the right and left are completely different from each other, for example, when the audio signal of the channel for the left is an audio signal of the cymbals and the audio signal of the channel for the right is an audio signal of the trumpet, since the common spectrum is different from the frequency spectra of the channels for the right and left, artificial noise can arise in decoding.
Therefore, it is proposed that a scale of a distance between frequency spectra of audio signals of channels for the right and left is calculated, and that when this scale is equal to or smaller than a threshold value common encoding such as the M/S stereo encoding is performed and when it is equal to or greater than the threshold value encoding is performed individually (for example, see Japanese Patent No. 3421726 which is hereinafter referred to as Patent Document 1).
Moreover, it is also proposed that frequency spectra of stereo audio signals are divided into pieces for predetermined frequency bands, and that, for each frequency band, the index to which intensity stereo encoding is applied is transmitted using a specific Huffman codebook number (for example, see Japanese Patent No. 3622982 which is hereinafter referred to as Patent Document 2). Thereby, the intensity stereo encoding can be switched between turning ON and OFF for each predetermined frequency band.
However, in the cases of the technologies of Patent Documents 1 and 2, when the common encoding or the intensity stereo encoding is frequently switched between turning ON and OFF, the sensing positions can become unstable or abnormal sound can arise.
Moreover, there are situations that high compression ratio is desirable for encoding. The situation can forcibly require employing the intensity stereo encoding for enhancing encoding efficiency even when the audio signals of the channels for the right and left are significantly different from each other. In this case, definitely sensible artificial noise can arise in decoding.
Meanwhile, it is considered that stereo audio signals, which are divided into pieces for bands, are mixed in mixing ratios based on distortion factors of encoding to be encoded (for example, see Japanese Patent No. 3951690). In this case, since separation of encoding object for the right and left (stereophonic feeling) is continuously controlled based on the distortion factors, the sensing positions can be prevented from being unstable or the occurrence of the abnormal sound can be prevented.
FIG. 1 is a block diagram illustrating one example of a configuration of an audio encoder performing such encoding.
The audio encoder 10 in FIG. 1 is configured to include a filter bank 11, a filter bank 12, an adaptive mixing part 13, a T/F transformation part 14, a T/F transformation part 15, an encoding control part 16, an encoding part 17, a multiplexer 18 and a distortion factor detection part 19.
To the audio encoder 10 in FIG. 1, an audio signal xL as a time signal of a left channel and an audio signal xR as a time signal of a right channel are inputted as stereo audio signals of an encoding object.
The filter bank 11 of the audio encoder 10 divides the audio signal xL inputted as the encoding object into audio signals for respective B frequency bands (bands). The filter bank 11 supplies the divided subband signals xbL with a band number b (b=1, 2, . . . , B) to the adaptive mixing part 13.
Similarly, the filter bank 12 divides the audio signal xR inputted as the encoding object into audio signals for respective B bands. The filter bank 12 supplies the divided subband signals xbR with a band number b (b=1, 2, . . . , B) to the adaptive mixing part 13.
The adaptive mixing part 13 determines mixing ratios of the subband signals xbL supplied from the filter bank 11 and the subband signals xbR supplied from the filter bank 12 based on distortion factors which are supplied from the distortion factor detection part 19 and are used in encoding of the past encoding objects.
Specifically, the adaptive mixing part 13 makes the mixing ratio larger as the distortion factor is larger, that is, an S/N ratio is smaller. Thereby, separation (stereophonic feeling) of the subband signals, which are to be obtained by mixing, for the right and left becomes small, and encoding efficiency is to be enhanced. On the other hand, the adaptive mixing part 13 makes the mixing ratio smaller as the distortion factor is smaller, that is, the S/N ratio is larger. Thereby, the separation (stereophonic feeling) of the subband signals, which are to be obtained by the mixing, for the right and left becomes large.
The adaptive mixing part 13 mixes the subband signal xbL and the subband signal xbR for each band based on the mixing ratio of the determined subband signal xbL to generate a subband signal xbLmix. Similarly, the adaptive mixing part 13 mixes the subband signal xbL and the subband signal xbR for each band based on the mixing ratio of the determined subband signal xbR to generate a subband signal xbRmix. The adaptive mixing part 13 supplies the generated subband signals xbLmix to the T/F transformation part 14 and supplies the subband signals xbRmix to the T/F transformation part 15.
The T/F transformation part 14 performs time-frequency transformation such as MDCT (Modified Discrete Cosine Transform) on the subband signals xbLmix and supplies the resulting frequency spectrum XL to the encoding control part 16 and the encoding part 17.
Similarly, the T/F transformation part 15 performs the time-frequency transformation such as the MDCT on the subband signals xbRmix and supplies the resulting frequency spectrum XR to the encoding control part 16 and the encoding part 17.
The encoding control part 16 selects any one encoding scheme of dual encoding, M/S stereo encoding and intensity encoding based on a correlation between the frequency spectrum XL supplied from the T/F transformation part 14 and the frequency spectrum XR supplied from the T/F transformation part 15. The encoding control part 16 supplies the selected encoding scheme to the encoding part 17.
The encoding part 17 encodes each of the frequency spectrum XL supplied from the T/F transformation part 14 and the frequency spectrum XR supplied from the T/F transformation part 15 using the encoding scheme supplied from the encoding control part 16. The encoding part 17 supplies the encoded spectrum obtained by the encoding and additional information regarding the encoding to the multiplexer 18.
The multiplexer 18 performs multiplexing of the encoded spectrum, additional information regarding the encoding, and the like, supplied from the encoding part 17 in a predetermined format, and outputs the resulting encoded data.
The distortion factor detection part 19 detects a distortion factor in the encoding of the encoding part 17 and supplies it to the adaptive mixing part 13.