A sound coding technology that compresses an audio signal or speech signal at a low bit rate is important for efficient utilization of radio in mobile communications and recording media. Methods for speech coding, in which a speech signal is coded, include G726 and G729 standardized by the ITU (International Telecommunication Union). These methods encode narrowband signals (300 Hz to 3.4 kHz), and enable high-quality coding at bit rates of 8 kbits/s to 32 kbits/s.
Standard methods for wideband signals (50 Hz to 7 kHz) include the ITU's G722 and G722.1, and AMR-WB of 3GPP (The 3rd Generation Partnership Project). These methods enable high-quality coding of wideband speech signals at bit rates of 6.6 kbits/s to 64 kbits/s.
An effective method of performing highly efficient coding of speech signals at a low bit rate is CELP (Code Excited Linear Prediction). CELP is a method whereby coding is performed based on a model that simulates through engineering a human voice generation model. To be specific, in CELP, an excitation signal which consists of random values is passed to a pitch filter corresponding to the strength of periodicity and a synthesis filter corresponding to vocal tract characteristics, and coding parameters are determined so that the square error between the output signal and input signal is minimized under auditory characteristic weighting.
In many of the latest standard speech coding methods, coding is performed based on CELP. For example, G729 enables narrowband signal coding at 8 kbits/s, and AMR-WB enables narrowband signal coding at 6.6 kbits/s to 23.85 kbits/s.
Meanwhile, in the case of audio coding that encodes audio signals, methods that convert an audio signal to frequency domain and perform coding using an auditory psychoacoustic model are commonly used, such as the Layer III method and AAC method standardized by MPEG (Moving Picture Experts Group). It is known that with these methods, almost no degradation occurs at 64 kbits/s to 96 kbits/s per channel for a signal with a 44.1 kHz sampling rate.
This audio coding is a method whereby high-quality coding is performed on music. Audio coding can also perform high-quality coding for a speech signal with music or environmental sound in the background as described above, and can handle a signal band of approximately 22 kHz, which is CD quality.
However, when coding is performed using a speech coding method on a signal in which a speech signal is predominant and music or environmental sound is superimposed in the background, there is a problem in that, due to the background music or environmental sound, not only the background signal but also the speech signal degrades, and overall quality deteriorates.
This problem occurs because speech coding methods are based on a method specialized toward a CELP speech model. There is a problem in that speech coding methods can only handle signal bands up to 7 kHz, and a signal that has components in higher bands cannot be handled adequately in terms of composition.
Moreover, with an audio coding method, a high bit rate must be used in order to achieve high-quality coding. With an audio coding method, if coding should be performed with the bit rate held down to 32 kbits/s, there is a problem of a major deterioration of decoded signal quality. There is thus a problem in that use is not possible on a communication network with a low transmission rate.