1. Field of the Invention
The present invention relates to a decoding apparatus, an encoding apparatus, a decoding method and an encoding method. More particularly, the present invention relates to a decoding apparatus, and an encoding apparatus in which an input signal is compressed highly-efficiently and encoded or decoded, and a decoding method and an encoding method in which the input signal is compressed highly-efficiently and encoded or decoded.
2. Description of the Related Art
Presently, there are various kinds of encoding and decoding apparatuses and methods that highly-efficiently compress speech and acoustic signals. One of such encoding and decoding methods is a scalable encoding method in which a part of an encoded sequence can be decoded according to a required quality or status of a network because it has scalable encoding characteristics. The scalable encoding process has an architecture to successively encode an input signal in such a way that an error signal between the input signal and a decoded signal of a lower layer encoder is further encoded by a higher layer encoder. The lowest layer is called a core layer and higher layers than the lowest layer are called enhancement layers. An example of a representative scalable encoding method is described in ISO/IEC14496-3, which is called MPEG-4 Audio, standardized by ISO/IEC. FIG. 1 shows a block diagram of the scalable encoding process. In FIG. 1, the Code-Excited Linear Prediction (CELP) encoding method, a parametric encoding method, such as for example, the Harmonic Vector Excitation Coding (HVXC) method and the Harmonic Individual Line with Noise (HILN) method or, a transform coding method, such as, for example, the Advanced Audio Coding (AAC) method and the Transform Domain Weighted Interleave Vector Quantization (TwinVQ) method is used in a core layer encoder 101. The encoders that perform the transform coding method are used in enhancement layer encoders 104.
FIG. 2 shows a block diagram of a CELP encoder. The CELP encoder as shown in FIG. 2 mainly has a linear prediction analyzer 201, a linear prediction coefficient quantization part 202, a linear prediction synthesis filter 203, an adaptive code book 204, a fixed code book 206, a perceptual weighting filter 208, a controller 209, an adder 212 and a subtracter 213. An input signal 200 is supplied to the CELP encoder every 5 to 40 ms and linear prediction analysis is performed on the input signal by the linear prediction analyzer 201. Then, the linear prediction coefficients 210 obtained by the linear prediction analysis are quantized by the linear prediction coefficient quantization part 202. The linear prediction synthesis filter 203 is constructed using the quantized linear prediction coefficients obtained as described above. Excitation vectors 211 to drive the linear prediction synthesis filter 203 are stored in the adaptive code book 204. The adaptive code book excitation vector is output from the adaptive code book 204 and the fixed code book excitation vector is output from the fixed code book 206 according to an output signal from the controller 209. Each of the vectors is multiplied by an adaptive code book gain 205 or a fixed code book gain 207, respectively. Then, the excitation vector 211 is generated at an output of an adder 212 by means of adding the results multiplied by each of the gains. The excitation vector 211 generated as described above is supplied to the linear prediction synthesis filter 203. An output signal of the linear prediction synthesis filter 203 is a synthesis signal, and an error signal between the input signal and the synthesis signal is calculated by the subtracter 213 and then, the error signal is supplied to the perceptual weighting filter 208. The perceptual weighting filter 208 supplies the perceptually weighted error signal to the controller 209. The controller 209 searches the excitation vector 211 so that the power level of the perceptually weighted error signal has minimum value and then, determines the adaptive code book gain 205 and the fixed code book gain 207 using the selected adaptive code book excitation vector and the selected fixed code book excitation vector, respectively, by the searches so that the power level of the perceptually weighted error signal has minimum value.
FIG. 3 shows a block diagram of a CELP decoder 300. In the decoder 300 as shown in FIG. 3, the coefficients for a linear prediction synthesis filter 305, an adaptive code book 301, an adaptive code book gain 302, a fixed code book 303, and a fixed code book gain 304 are extracted from a code word sequence 311. The adaptive code book excitation vector and the fixed code book excitation vector are respectively multiplied by each of the gains and then, they are added by the adder 307 and then, the signal is an excited vector 306. The linear prediction synthesis filter 305 is driven by the excitation vector 306 and a decoded signal 312 is supplied as an output signal.
On the other hand, FIG. 4 shows an encoder 400 for transform coding. The encoder 400 mainly has an orthogonal transformation part 401, a transform coefficient quantization part 402 and a quantized transform coefficient encoding part 403. The transform coefficients 405 are calculated by performing the orthogonal transform for the input signal at the orthogonal transformation part 401. The transform coefficients 405 are quantized by the transform coefficient quantization part 402 and then, the quantized transform coefficients 406 are encoded to an encoded code sequence 407 by the quantized transform coefficient encoding part 403.
FIG. 5 shows a block diagram of a decoder 500 for decoding a transform-encoded code sequence 504. In the decoder as shown in FIG. 5, the encoded code sequence 504 is decoded to the quantized transform coefficients by the quantized transform coefficient decoding part 501 and then, the quantized transform coefficients are de-quantized to the transform coefficients by the transform coefficient de-quantization part 502. The transform coefficients obtained as described above are inverse-orthogonally-transformed to a decoded signal by the inverse orthogonal transformation part 503.
As described above, in the transform coding, the input signal in the time domain is orthogonally transformed into the coefficients in the frequency domain and then, the quantization and the encoding are performed. Therefore, when the encoded code sequence is inversely-transformed into the signal in the time domain, quantization noise that is generated by the quantization in the frequency domain spreads over a whole transform block (that is an unit of the transform coding) at approximately the same level. Therefore, if there is steep rising-transition of amplitude, which is so called ‘attack’, in a part of an input signal within the transform block, a pre-echo that is a jarring noise will occur at a part prior to the steep rising-transition of the amplitude. For example, if a transform block length is long, the interval in which the pre-echo occurs is also long. Therefore, the subjective quality is further degraded. When the transform coding is used in the scalable encoding as described above, the same problem as the problem generated by the transform coding arises.
To solve this problem, a technology of an adaptive block length conversion is used in the MPEG-4 Audio (ISO/IEC14496-3) as described above. In the technology, if there is a steep rising-transition of the amplitude in the input signal, a short transform block is used and, if there is not a steep rising-transition of the amplitude in the input signal, a long transform block is used. However, it is necessary to detect whether a steep rising-transition of the amplitude in the input signal exists or not in order to perform switching of the length. There is an example of such a detection method below. At first, the input signal is divided into the transform blocks and a Fourier transformation is performed on the transform blocks. Next, the obtained Fourier transform coefficients are divided to some frequency bands. Then, a parameter called perceptual entropy is calculated based on a signal to masking ratio (SMR) that is a ratio between the minimum audible noise calculated using a psychoacoustic model and the input signal power for each of the frequency bands. The steep rising-transition of the amplitude is detected by comparing the perceptual entropy with a predetermined threshold value. This method is used in the scalable encoding in the MPEG-4 Audio (ISO/IEC14496-3).
However, in the prior art method as described above, the length of the transform block is only adjusted to become short in order to shorten the interval in which the pre-echo exists. Further, because the transform block length varies, supplementary information that indicates the transform block length is required in order to decode the encoded code sequence at the decoding side. Therefore, the structure of the system becomes complex.