Conventionally, speech coding (speech codec) is used for communication applications using telephony narrowband speech (200 Hz to 3.4 kHz). Monophonic narrowband speech codec is widely used in communication applications including voice communication using mobile phones, teleconferencing equipment and packet networks (e.g. Internet).
One of steps towards more realistic speech communication system is the move from monophonic speech representation to stereophonic speech representation. Wideband stereophonic communications provide a more natural sounding environment. Scalable stereo speech coding is a core technology for realizing voice communications with superior quality and usability.
One of popular methods of encoding a stereo speech signal is attributed to employing a signal prediction scheme based on a monaural speech. That is, a reference channel signal is transmitted using known monaural speech codec, and the left or right channel is predicted from this reference channel signal using additional information and parameters. In many applications, a monaural signal in which a left channel signal and right channel signal are mixed is selected as the reference channel signal.
As stereo signal coding methods including intensity stereo coding (ISC), binaural cue coding (BCC) and inter-channel prediction (ICP) are known. These parametric stereo coding methods all have different strengths and weaknesses and are suitable for encoding different source materials.
Non-Patent Document 1 discloses a technique of predicting stereo signals based on monaural signals using these coding methods. Specifically, a monaural signal is acquired by synthesizing channel signals forming stereo signals (e.g. a left channel signal and a right channel signal), the acquired monaural signal is encoded/decoded using known speech codec, and, furthermore, from the monaural signal, a difference signal between the left channel and the right channel (i.e. a side signal) is predicted using prediction parameters. With this coding method, the coding side models the relationships between a monaural signal and a side signal using time-dependent adaptive filters and transmits filter coefficients calculated per frame to the decoding side. By filtering a high-quality monaural signal transmitted by monaural codec, the decoding side regenerates the difference signal and calculates the left channel signal and right channel signal from the regenerated difference signal and the monaural signal.
Further, Non-Patent Document 2 discloses a coding method referred to as “cross-channel correlation canceller” whereby, by applying a technique of cross-channel correlation canceller to the ICP scheme coding method, it is possible to predict one channel from the other channel.
Further, in recent years, an audio compression technique is rapidly developed, a modified discrete cosine transform (MDCT) scheme has been becoming a major technique of high-quality audio coding (see Non-Patent Documents 3 and 4).
MDCT has been applied to audio compression without major auditory problems if a proper window such as a sine window is employed. Recently, MDCT plays an important role in multimode transform predictive coding paradigms.
The multimode transform predictive coding refers to combining speech and audio coding principles in a single coding structure (see Non-Patent Document 4). It should be noted that the MDCT-based coding structure and application in Non-Patent Document 4 are designed for encoding signals in only one channel, and quantize MDCT coefficients in different frequency regions using different quantization schemes.    Non-Patent Document 1: Extended AMR Wideband Speech Codec (AMR-WB+): Transcoding functions, 3GPP TS 26.290.    Non-Patent Document 2: S. Minami and O. Okada, “Stereophonic ADPCM voice coding method,” in Proc. ICASSP'90, April 1990.    Non-Patent Document 3: Ye Wang and Miikka Vilermo, “The modified discrete cosine transform: its implications for audio coding and error concealment,” in AES 22nd International Conference on Virtual, Synthetic and Entertainment, 2002.    Non-Patent Document 4: Sean A. Ramprashad, “The multimode transform predictive coding paradigm,” IEEE Tran. Speech and Audio Processing, vol. 11, pp. 117-129, March 2003.    Non-Patent Document 5: Wai C. Chu, “Speech coding algorithms: foundation and evolution of standardized coders”, ISBN 0-471-37312-5, 2003