Generally, there have been numerous methods and apparatuses for efficiently compressing and encoding sound signals and/or image signals so far. Typical encoding methods for sound signals include a method using the MPEG-2 Audio standardized by ISO/IEC, for example. Moreover, typical encoding methods for image signals include a method using the MPEG-4 Visual standardized by ISO/IEC and a method using the ITU-T Recommendation H.263.
The encoding process of various input signals is made possible by use of these encoding methods. A model aiming at encoding a specific input signal (such as the CELP which is a basic algorithm of voice encoding) is not used in these encoding methods. Moreover, in these encoding methods, a time domain signal (or a space domain signal) is transformed into a frequency domain signal for each block, and then the encoding process is performed. By this transforming process, temporal redundancy existing in the input signal is localized in the frequency domain. As a result, encoding efficiency is enhanced in the encoding process of the input signal.
Meanwhile, it is generally said that human auditory characteristics and human visual characteristics depend on frequency. For this reason, transformation of a time domain signal into a frequency domain signal as described above is convenient in terms of the following point. The point is that such an encoding process is performed in consideration of the human visual characteristics and the human auditory characteristics.
Methods for transforming a time domain signal (or a space domain signal) into a frequency domain signal include a Fourier transform method, a discrete cosine transform (DCT conversion) method, a modified discrete cosine transform (MDCT conversion) method, and a wavelet transform (WT conversion) method, for example.
Here, in the DCT coding method (or in the MDCT coding method), a time domain input signal is firstly transformed into a transformed signal which is a frequency domain signal. Then, the transformed signal is subjected to a quantizing process. In this quantizing process, a predetermined weighting is applied to a DCT coefficient (or an MDCT coefficient) based on an auditory psychological model (a model derived based on the human auditory characteristics) and on amplitude characteristics (amplitude characteristics of a frequency domain input signal). By this weighting process, it is possible to control quantization noise included in a decoded signal to an extent of being virtually imperceptible for a user. In this event, in case of DCT (or in case of MDCT), the transforming process of the input signal is performed for each of certain blocks. For this reason, a fixed weight is assigned to the DCT coefficient (or the MDCT coefficient) for each of certain blocks.
However, the above-described prior art has had the following problems. When a length of each of certain blocks is equal to or longer than a predetermined length, characteristics of an input voice signal corresponding to the certain block often vary one after another in each of consecutive short time periods. For example, in the change of time, a portion where a voice input signal sharply rises and a portion where the voice input signal does not change exist in the certain block (in the input voice signal corresponding to the block). In this respect, heretofore, the fixed weighting process corresponding to the length of the certain block has been performed. In this weighting process, characteristics concerning these portions existing in the block are not considered. For this reason, heretofore, it has not been deemed possible to control the quantization noise (the quantization noise caused by error signals) to an extent of being virtually imperceptible for the user.
Meanwhile, in case of DCT (or in case of MDCT), there is also a technique of performing the transforming process for each of certain short blocks of the input signal. In this technique, a fixed weight is assigned to the DCT coefficient or the MDCT coefficient for each of the certain short blocks.
According to this technique, even when the characteristics of the input voice signal vary with the consecutive short time periods, it is possible to perform the weighting process which corresponds to the characteristics of the input voice signal. By this weighting process, it is possible to control the quantization noise to an extent of being virtually imperceptible for the user.
However, when the transforming process of the input signal is performed for each of certain short blocks, there are the following problems. In this case, frequency resolution of the input signal is reduced because an interval of observation of the input signal is shortened. Moreover, supplementary information for decoding a signal obtained by encoding the input signal (such as information indicating a quantization width necessary for decoding the input signal) is required for every short block. Accordingly, encoding efficiency of the input signal is reduced.
Therefore, development of a signal encoding apparatus has been awaited which is capable of controlling the quantization noise to an extent of being virtually imperceptible for the user even when the characteristics of the input signal vary with the consecutive short time periods, and of preventing reduction in the frequency resolution and reduction in the encoding efficiency.
An object of the present invention is to provide a signal encoding apparatus, a signal encoding method, and a program, which are capable of controlling the quantization noise to an extent of being fully imperceptible for the user even when the characteristics of the input signal vary with the consecutive short time periods, and of preventing reduction in the frequency resolution and reduction in the encoding efficiency.