In mobile communication systems, in order to make effective use of radio spectrum resources and the like, there is a need to compress a speech signal to a low bit rate for transmission thereof. There is also a desire for a telephone service with improved speech quality and a good feeling of naturalness, and the achievement thereof makes desirable the high-quality encoding of not only monaural signals, but also multichannel audio signals, and in particular stereo audio signals.
A known method for encoding a stereo audio signal at low bit rate is the intensity stereo method. In the intensity stereo method, a monaural signal is multiplied by scaling coefficients to generate an L-channel signal (left-channel signal) and an R-channel signal (right-channel signal). A method such as this is called amplitude panning.
The most basic method of amplitude panning is that of multiplying a monaural signal in the time domain by gain coefficients for amplitude panning (panning gain coefficient) to determine the L-channel signal and the R-channel signal (refer, for example, to the Non-Patent Literature 1). Another method is that of multiplying a monaural signal by panning gain coefficients for each frequency component (or each frequency group) in the frequency domain to determine the L-channel signal and the R-channel signal (refer to, for example, Non-Patent Literature 2).
If panning gain coefficients are used as encoding parameters of parametric stereo, scalable encoding (monaural-stereo scalable encoding) of a stereo signal can be done (refer to, for example, Patent Literatures 1 and 2). The panning gain coefficients are described in Patent Literature 1 as balance parameters and are described in Patent Literature 2 as ILDs (level differences).
In a mobile communication system, in order to make effective use of radio spectrum resources, a technique exists as intermittent transmission (DTX: discontinuous transmission) exists (refer to, for example, Non-Patent Literature 3). The DTX technique is a technique that, when speech is not emitted, information representing background noise is intermittently transmitted at an ultra-low bit rate. This enables reduction of the average bit rate during a conversation, and also accommodation of more mobile terminals with the same frequency band.
For example, in Non-Patent Literature 3, at a rate of one time every eight frames in a frame that is judged to be a non-speech section (inactive speech section, background noise section), LPC (linear prediction coding) coefficients are quantized by 29 bits (for example, by converting LPC coefficients to LSF (line spectral frequency) coefficients, and the frame energy is quantized by 6 bits, making a total of 35 bits (bit rate: 1.75 kbits/s). In the decoding section, ten pulses per frame generated based on random numbers are multiplied by the decoded frame energy, and the result is passed through a synthesis filter constituted by the decoded LPC coefficients to generate a decoded signal. This decoding processing is performed, while updating the LPC coefficients and the frame energy every eight frames.