The present disclosure relates to an encoding apparatus, an encoding method, a decoding apparatus, a decoding method, and a program, and more particularly to an encoding apparatus, an encoding method, a decoding apparatus, a decoding method, and a program capable of generating an audio signal for concealment having a more natural sound.
In these years, audio signals are often digitized and resultant digital signals are compressed and encoded, and then transmitted or saved. Encoding of audio signals is generally categorized into waveform coding and analysis/synthesis coding. The waveform coding includes band division coding, in which an audio signal is divided into a plurality of frequency components using a band division filter and encoded, and transform coding, in which a digital audio signal is subjected to a time-frequency transform on a block-by-block basis and resultant spectra are encoded. In the waveform coding, an audio signal that has been divided into frequency components using a band division filter or a time-frequency transform is quantized on a band-by-band basis and subjected to highly efficient coding utilizing so-called auditory masking effect or the like.
FIG. 1 is a block diagram illustrating an example of the configuration of an encoding apparatus that performs transform coding.
An encoding apparatus 10 illustrated in FIG. 1 includes a time-frequency transform unit 11, a spectrum normalization unit 12, a spectrum quantization unit 13, an entropy encoding unit 14, a scale factor encoding unit 15, and a multiplexer 16.
The time-frequency transform unit 11 of the encoding apparatus 10 receives an audio signal, which is a time signal. The time-frequency transform unit 11 performs time-frequency transforms such as modified discrete cosine transforms (MDCTs) on the input audio signal on a frame-by-frame basis. The time-frequency transform unit 11 supplies a resultant frequency spectral coefficient (MDCT coefficient) for each frame to the spectrum normalization unit 12.
The spectrum normalization unit 12 groups the frequency spectral coefficients for the frames supplied from the time-frequency transform unit 11 on a quantization (quantization unit) basis for certain bandwidths. The spectrum normalization unit 12 normalizes the grouped frequency spectral coefficients for the quantization units using the following expression (1) and a coefficient 2−λ×SF[n] of a certain step size on a frame-by-frame basis.XNorm(k)=X(k)×2−λ×SF[n]  (1)
In the expression (1), X(k) denotes a k-th frequency spectral coefficient of an n-th quantization unit, and XNorm(k) denotes a normalized frequency spectral coefficient. In addition, λ is a value for determining the step size. For example, if λ=0.5, the step size is 3 dB. Here, the step size λ is assumed to be constant regardless of the frame. In addition, here, an index SF[n] (integer) as information regarding the coefficient 2−λ×SF[n] is called a “scale factor”.
The spectrum normalization unit 12 supplies the frequency spectral coefficient for each frame that has been normalized as described above to the spectrum quantization unit 13 and a scale factor for each frame that has been used for the normalization to the scale factor encoding unit 15.
The spectrum quantization unit 13 quantizes the normalized frequency spectral coefficient for each frame supplied from the spectrum normalization unit 12 using a certain number of bits, and supplies the quantized frequency spectral coefficient for each frame to the entropy encoding unit 14. In addition, the spectrum quantization unit 13 supplies, to the multiplexer 16, quantization information indicating the number of bits of each quantization unit of the normalized frequency spectral coefficient for each frame during the quantization.
The entropy encoding unit 14 performs reversible compression on the quantized frequency spectral coefficient for each frame supplied from the spectrum quantization unit 13 by Huffman coding, arithmetic coding, or the like, and supplies a resultant frequency spectral coefficient to the multiplexer 16 as encoded spectrum data.
The scale factor encoding unit 15 encodes the scale factor for each frame supplied from the spectrum normalization unit 12. The scale factor encoding unit 15 supplies the encoded scale factor for each frame to the multiplexer 16 as an encoded scale factor.
The multiplexer 16 multiplexes the encoded spectrum data from the entropy encoding unit 14, the encoded scale factors from the scale factor encoding unit 15, and the quantization information from the spectrum quantization unit 13, in order to generate encoded data for each frame. The multiplexer 16 outputs the encoded data.
In the above-described encoding apparatus 10, an encoding error may occur due to a reason such as the number of bits of a frame is smaller than the number of bits necessary for encoding or encoding takes more time than a period of time during which real-time processing can be performed. In this case, since it is difficult to perform encoding again, it is necessary to prepare error concealment means that outputs encoded data for concealment instead of irregular data, so that the irregular data is not output as encoded data.
As the error concealment means, for example, a technique has been proposed in which, if encoding does not end before a time limit, encoded data of a frame located prior to a frame to be encoded is output as encoded data for concealment instead of encoded data of the frame to be encoded (for example, refer to Japanese Patent No. 3463592).
In addition, as the error concealment means, another technique has been proposed in which encoded data for concealment is prepared in advance by encoding a silent signal or the like and the encoded data is output instead of encoded data of a frame in which an encoding error has occurred (for example, refer to Japanese Unexamined Patent Application Publication No. 2003-5798).
On the other hand, an audio compression transmission apparatus has been proposed that, if a synchronization abnormality of encoded data has been detected during decoding, outputs, as encoded data for concealment, silent encoded data stored in advance instead of the encoded data (for example, refer to Japanese Patent No. 2731514).
In addition, an apparatus has been proposed that replaces, in accordance with a mute instruction from outside, encoded data with silent encoded data created in advance and outputs the silent encoded data (for example, refer to Japanese Unexamined Patent Application Publication No. 9-294077).