Speech and audio coding technologies that compress the amount of data in a signal to one-several tenths by removing information which is not necessarily perceived by a human according to the auditory psychology is a significantly important technology in connection with transmission and accumulation of signals. An example of widely used perceptual audio coding techniques is MPEG4 AAC (Advanced Audio Coding) standardized by ISO/IEC MPEG (Moving Picture Experts Group).
Further, as a method for improving the performance of speech coding and obtaining high speech quality at a low bit rate, a bandwidth extension technology that generates high frequency band components of a speech using low frequency band components thereof has been widely used recently. A typical example of the bandwidth extension technology is the SBR (Spectral Band Replication) technology used in MPEG4 AAC. The SBR technology generates high frequency band components by performing, on a signal transformed into the frequency domain by QMF (Quadrature Mirror Filter) bank, copying spectral coefficients from a low frequency band to a high frequency band and thereafter adjusts the high frequency band components by adjusting the spectral envelope and tonality of the replicated coefficients. Adjustment of the spectral envelope and tonality will be referred hereinafter to as “adjustment of frequency envelope”. The speech encoding method using such a bandwidth extension technology can reproduce high frequency band components of a signal using only a small amount of supplementary information, and it is thus effective to achieve lower bit rate of speech coding.
In the bandwidth extension technology in the frequency domain such as SBR, since the frequency envelope is adjusted to the spectral coefficients expressed in the frequency domain, when an audio signal with large variations of time envelope, such as a speech signal, a clapping sound or a castanet sound, is encoded, there is a case where reverberant noise called pre-echo or post-echo may be perceived in a decoded signal. This problem is caused by the fact that the time envelope of high frequency band components is deformed in the process of adjustment and, in many cases, becomes flatter in shape than before the adjustment. The time envelope of high frequency band components that has become flat as a result of the adjustment does not coincide with the time envelope of high frequency band components in the original signal before encoding and causes pre-echoes or post-echoes.
As a solution to this problem, the following method is known (see WO/2010/114123). Specifically, the method acquires the electric power of low frequency band components for each time slot of a frequency domain signal, extracts time envelope information from the acquired power, and superimposes the extracted time envelope information onto high frequency band components that are adjusted using supplementary information and then processed to adjust the frequency envelope. This method is referred hereinafter to as “a method of time envelope deformation”.
It is thereby possible to adjust the time envelope of a decoded signal to have a less distorted shape and obtain a reproduced signal with less pre-echo and post-echo.