In mobile communication systems or packet communication systems using IP, the restriction of the digital signal processing speed in DSP (Digital Signal Processor) and bandwidth are gradually relaxed. If the transmission rate becomes a higher bit rate, a band for just transmitting a plurality of channels can be acquired, so that communication using the stereo scheme (i.e. stereo communication) is expected to become popular even in speech communication where the monaural scheme is currently a mainstream.
Current mobile telephones have already mounted a multimedia player, which provides stereo function, and FM radio functions. Therefore, it naturally follows that the fourth generation mobile telephones and IP telephones have functions of recording and playing speech communication by stereo speech and stereo speech signals in addition to stereo audio signals.
One popular method of encoding a stereo speech signal adopts the signal prediction technique based on a monaural speech codec. That is, the fundamental channel signal is transmitted using a known monaural speech codec, to predict the left channel or right channel from this basic channel signal using additional information and parameters. In many applications, a mixed monaural signal is selected as the fundamental channel signal.
Until now, methods of encoding a stereo signal include ISC (Intensity Stereo Coding), BCC (Binaural Cue Coding), ICP (Inter-Channel Prediction), and so on. These parametric stereo coding methods have different strengths and weaknesses, making these methods suitable for coding of different excitations (source materials).
Non-Patent Document 1 discloses a technique of predicting a stereo signal based on a monaural codec, using those coding methods. To be more specific, a monaural signal is generated by synthesis using channel signals forming a stereo signal such as the left channel signal and the right channel signal, the resulting monaural signal is encoded/decoded using a known speech codec, and, furthermore, the difference signal (i.e. side signal) between the left channel and the right channel is predicted from the monaural signal using prediction parameters. In such a coding method, the coding side models the relationship between the monaural signal and the side signal using time-dependent adaptive filters, and transmits filter coefficients calculated on per frame basis, to the decoding side. The decoding side reconstructs the difference signal by filtering the monaural signal of high quality transmitted by the monaural codec, and calculates the left channel signal and the right channel signal from the reconstructed difference signal and the monaural signal.
Further, Non-Patent Document 2 discloses a coding method using a so-called “cross-channel correlation canceller,” and, when the technique using a cross-channel correlation canceller is applied to the coding method of the ICP scheme, it is possible to predict one channel from the other channel.
Recently, audio compression technology has been rapidly developed, and, in particular, the modified discrete cosine transform (“MDCT”) scheme is the predominant method in high quality audio coding (see Non-Patent Document 3 and Non-Patent Document 4).
In addition to the energy compaction capability, MDCT achieves critical sampling, reduced block effect and flexible window switching at the same time. MDCT uses the concept of time domain alias cancellation (“TADC”) and frequency domain alias cancellation. Further, MDCT is designed to achieve perfect reconstruction.
MDCT is widely used in an audio coding paradigm. Further, in a case where a proper window (e.g. sine window) is employed, MDCT has been applied to audio compression without major perceptual problems. In recent years, MDCT plays an important role in the multimode transform predictive coding paradigm.
The multimode transform predictive coding paradigm combines a speech coding principle and audio coding principle in a single coding structure (see Non-Patent Document 4). Here, the MDCT-based coding structure and its application in Non-Patent Document 4 are designed for encoding signals of only one channel, using different quantization schemes to quantize MDCT coefficients in different frequency domains.
Non-Patent Document 1: Extended AMR Wideband Speech Codec (AMR-WB+): Transcoding functions, 3GPP TS 26.290.
Non-Patent Document 2: S. Minami and O. Okada, “Stereophonic ADPCM voice coding method,” in Proc. ICASSP '90, April 1990.
Non-Patent Document 3: Ye Wang and Miikka Vilermo, “The modified discrete cosine transform: its implications for audio coding and error concealment,” in AES 22nd International Conference on Virtual, Synthetic and Entertainment, 2002.
Non-Patent Document 4: Sean A. Ramprashad, “The multimode transform predictive coding paradigm,” IEEE Tran. Speech and Audio Processing, vol. 11, pp. 117-129, March 2003.