As broadband transmission in mobile communication and IP communication has become the norm and services in such communications have diversified, high sound quality of and higher-fidelity speech communication is demanded. For example, from now on, communication in a hands-free video phone service, speech communication in video conferencing, multi-point speech communication where a number of callers hold a conversation simultaneously at a number of different locations and speech communication capable of transmitting background sound without losing high-fidelity will be expected to be demanded. In this case, it is preferred to implement speech communication by a stereo signal that has higher-fidelity than using monaural signals and that makes it possible to identify the locations of a plurality of calling parties. To implement speech communication using a stereo signal, stereo speech encoding is essential.
Further, to implement traffic control and multicast communication over a network in speech data communication over an IP network, speech encoding employing a scalable configuration is preferred. A scalable configuration includes a configuration capable of decoding speech data on the receiving side even from partial coded data.
Even when encoding stereo speech, it is preferable to implement encoding a monaural-stereo scalable configuration where it is possible to select decoding a stereo signal or decoding a monaural signal using part of coded data on the receiving side.
Speech coding methods employing a monaural-stereo scalable configuration include, for example, predicting signals between channels (abbreviated appropriately as “ch”) (predicting a second channel signal from a first channel signal or predicting the first channel signal from the second channel signal) using pitch prediction between channels, that is, performing encoding utilizing correlation between 2 channels (see Non-Patent Document 1).
Non-Patent Document 1: Ramprashad, S. A., “Stereophonic CELP coding using cross channel prediction”, Proc. IEEE Workshop on Speech Coding, pp. 136-138, September 2000.