Mobile communication systems are required to transmit speech signals compressed at a low bit rate for effective utilization of radio wave resources or the like. Meanwhile, mobile communication systems are also required to realize quality improvement of call speech or call services with a high level of realism. In order to realize the quality improvement and call services, it is preferable to encode wider band speech signals or music signals or the like with high quality.
To respond to these two mutually contradictory demands, a technique that integrates a plurality of encoding techniques hierarchically is regarded as promising. This technique hierarchically combines a first layer that encodes an input signal up to a wideband (0 to 7 kHz) with a band extension layer that uses the input signal and a decoded signal of the first layer to perform encoding up to ultra-wideband (0 to 14 kHz).
In the following description, a signal band (0 to 7 kHz) encoded in the first layer is called a “wideband region” and a signal band (7 kHz to 14 kHz) encoded in a band extension layer is called an “extension band region.” FIG. 1 illustrates the wideband region and the extension band region in an input signal spectrum. Thus, in the technique that hierarchically performs encoding, a bit stream obtained from an coding apparatus has scalability, that is, the nature that a decoded signal can be acquired even from information of parts of a bit stream, and therefore the technique is generally called “scalable encoding (hierarchic encoding).”
Since the scalable coding scheme can flexibly respond to communication between networks of different bit rates based on its own nature, the scalable coding scheme can be said to be suitable for future network environments in which a variety of networks are integrated using IP protocols.
There is a technique disclosed in NPL 1 as an example of realizing scalable encoding using a technique standardized in ITU-T (International Telecommunication Union Telecommunication Standardization Sector). This technique encodes signals of the wideband region in the first layer and performs encoding in the band extension layer by extending signals of the extension band region using the signals of the wideband region.
Using such a scalable configuration makes it possible to achieve speech signals and music signals having a wider band than speech signals of high quality.
However, when encoding is performed at a low bit rate, since fewer bits are assigned to the band extension layer, the output signal (decoded signal) produces a sound quality quite offensive to the ear (a feeling of abnormal sound). In such a case where only fewer bits are assigned to a certain frequency band, a scheme may be adopted whereby abnormal sounds are reduced by limiting the frequency band of the output signal in accordance with the bit rate and intensively assigning bits to the remaining band (NPL 2). However, at the same time, there is also a drawback in that the band limitation impairs a feeling of clarity (a feeling of bandwidth) and degrades subjective quality. That is, when the above-described band limiting scheme is adopted, a feeling of abnormal sound and a feeling of bandwidth are in a trade-off relationship.
In order to avoid the above-described problems, a scheme may be considered which applies a low-pass filter having a moderate characteristic for an output signal instead of completely limiting the bandwidth of the above-described output signal and causes the high-band energy to attenuate so as to reduce abnormal sounds while maintaining a feeling of bandwidth. In that case, it is preferable to adaptively switch filter coefficients in accordance with characteristics of the (output) signal. PTL 1 is an example of the scheme for adaptively switching filter coefficients. This is a scheme that adjusts coefficients of a high band emphasis filter in accordance with the ratio of high-band energy in high band emphasis processing of a post filter and weakens the high band emphasis when the energy ratio is high. This makes it possible to design a filter with appropriate intensity in accordance with characteristics of a signal (decoded signal) inputted to the filter and limit a feeling of abnormal sound while maintaining a feeling of bandwidth to a certain degree.