Most speech codecs encode only monaural speech signals. Monaural speech signals do not provide spatial information like stereo speech signals do. Such monaural codecs are generally employed, for example, in communication equipment such as mobile phones and teleconference equipment where signals are generated from a single source such as human speech. In the past, such monaural signals were sufficient, due to the limitation of transmission bandwidth. However, with the improvement of bandwidth by technical advancement, this limit has been gradually becoming less important. On the other hand, the quality of speech has become a more important factor for consideration, and so it is important to provide high-quality speech at bit rates as low as possible.
The stereo functionality is useful in improving perceptual quality of speech. One application of the stereo functionality is high-quality teleconference equipment that can identify the location of the speaker when a plurality of speakers are present at the same time.
At present, stereo speech codecs are not so common compared to stereo audio codecs. In audio coding, stereophonic coding can be realized in a variety of methods, and this stereo functionality is considered a norm in audio coding. By independently coding two right and left channels as dual mono signals, the stereo effect can be achieved. Also, by making use of the redundancy between two right and left channels, joint stereo coding can be performed, thereby reducing the bit rate while maintaining good quality. Joint stereo coding can be performed by using mid-side (MS) stereo coding and intensity (I) stereo coding. By using these two methods together, higher compression ratio can be achieved.
These audio coding methods have the following disadvantages. That is, to independently encode right and left channels, a reduction in the bit rate by making use of the correlation redundancy between channels is not obtained, and so the bandwidth is wasted. Therefore, stereo channels require twice a bit rate, compared to monaural channels.
Also, MS stereo coding utilizes the correlation between stereo channels. In MS stereo coding, when coding is performed at low bit rates for narrow bandwidth transmission, aliasing distortion is likely to occur and stereo imaging of signals also suffers.
For intensity stereo coding, the ability of human auditory system to resolve high-frequency components is reduced in high-frequency band, and so intensity stereo coding is effective only in high-frequency band and is not effective in low-frequency band.
Most speech coding methods are considered to be parametric coding that works by modeling the human vocal tract with parameters using variations of the linear prediction method, and the joint stereo coding method is also unsuitable for stereo speech codec.
One speech coding method similar to audio codec, is to independently encode stereo speech channels, thereby achieving the stereo effect. However, this coding method has the same disadvantage as that of the audio codec which uses twice a bandwidth compared to the method of coding only the monaural source.
Another speech coding method employs cross channel prediction (for example, see Non-patent Document 1). This method makes use of the interchannel correlation in stereophonic signals, thereby modeling the redundancies such as the intensity difference, delay difference, and spatial difference between stereophonic channels.
Still another speech coding method employs parametric spatial audio (for example, see Patent Document 1). The fundamental idea of this method is to use a set of parameters to represent speech signals. These parameters which represent speech signals are used in the decoding side to resynthesize signals perceptually similar to the original speech. In this method, after the band is divided into a plurality of subbands, parameters are calculated on a per subband basis. Each subband is made up of a number of frequency components or band coefficients. The number of these components increases in higher frequency subbands. For instance, one of the parameters calculated per subband is the interchannel level difference. This parameter is the power ratio between the left (L) channel and the right (R) channel. This interchannel level difference is employed in the decoder side to correct the band coefficients. Because one interchannel level difference is calculated per subband, the same interchannel level difference is applied to all subband coefficients in the subband. This means that the same modification coefficients are applied to all the subband coefficients in the subband.    Patent Document 1: International Publication No. 03/090208 Pamphlet    Non-patent Document 1: Ramprashad, S. A., “Stereophonic CELP coding using Cross Channel Prediction”, Proc. IEEE workshop on speech encoding, pages 136-138, (17-20 Sep. 2000)