The present invention relates to coding and decoding of multi-channel audio signals. The main object of the present invention is to code digital audio signals while maintaining the perceptual quality of the digital audio signals as much as possible, even under the bit rate constraint. A reduced bit rate is advantageous in terms of reduction in transmission bandwidth and storage capacity.
A number of conventional arts suggest methods for achieving bit rate reduction as mentioned above.
In the “mid-side (MS) stereo” approach, stereo channels L and R are represented in the form of their “sum” (L+R) and “difference” (L−R) channels. If the stereo channels are highly correlated, the “difference” signal contains insignificant information that can be coarsely quantized with fewer bits than the “sum” signal. In the extreme case such as L=R, no information needs to be transmitted for the difference signal.
In the “intensity stereo” approach, psychoacoustic properties of the ear are exploited, and only the “sum” signal is transmitted for the high frequency region, together with frequency-dependent scale factors, which are to be applied to the “sum” signal at the decoder so as to synthesize the L and R channels.
In the “binaural cue coding” approach, binaural cues are generated to shape a downmix signal in the decoding process. The binaural cues are, for example, inter-channel level/intensity difference (ILD), inter-channel phase/delay difference (IPD), and inter-channel coherence/correlation (ICC), and the like. The ILD cue measures the relative signal power; the IPD cue measures the difference in sound arrival time to the ears; and the ICC cue measures the similarity. In general, the level/intensity cue and phase/delay cue control the balance and lateralization of sound, whereas the coherence/correlation cue controls the width and diffusiveness of the sound. These cues are, in totality, spatial parameters that help the listener mentally compose an auditory scene.
FIG. 1 is a diagram which shows a typical codec (coding and decoding) that employs a coding and decoding method in the binaural cue coding approach. In the coding process, an audio signal is processed on a frame-by-frame basis. A downmix unit (500) downmixes the left and right channels L and R to generate M=(L+R)/2. A binaural cue extraction module (502) processes the L, R and M to generate binaural cues. The binaural cue extraction module (502) usually includes a time-frequency transform module. This time-frequency transform module transforms L, R and M into, for example, fully spectral representations through FFT, MDCT or the like, or hybrid time-frequency representations through QMF or the like. Alternatively, M can be generated from L and R after spectral transform thereof by taking the average of the spectral representations of L and R. Binaural cues can be obtained by comparing these representations of L, R and M on a spectral band, on a spectral band basis.
An audio encoder (504) codes the M signal to generate a compressed bit stream. Some examples of this audio encoder are encoders for MP3, AAC and the like. The binaural cues are quantized and multiplexed with the compressed M at (506) to form a complete bit stream. In the decoding process, a demultiplexer (508) demultiplexes the bit stream of M from the binaural cue information. An audio decoder (510) decodes the bit stream of M to reconstruct the downmix signal M. A multi-channel synthesis module (512) processes the downmix signal and the dequantized binaural cues to reconstruct the multi-channel signals. Documents related to the conventional arts are as follows:
Non-patent Reference 1: [1] ISO/IEC 14496-3:2001/FDAM2, “Parametric Coding for high Quality Audio”
Patent Reference 1: [2] WO03/007656A1, “Efficient and Scalable Parametric Stereo Coding for Low Bitrate Application”
Patent Reference 2: [3] WO03/090208A1, “Parametric Representation of Spatial Audio”
Patent Reference 3: [4] U.S. Pat. No. 6,252,965B1, “Multichannel Spectral Mapping Audio Apparatus and Method”
Patent Reference 4: [5] US2003/0219130A1, “Coherence-based Audio Coding and Synthesis”
Patent Reference 5: [6] US2003/0035553A1, “Backwards-Compatible Perceptual Coding of Spatial Cues”
Patent Reference 6: [7] US2003/0235317A1, “Equalization For Audio Mixing”
Patent Reference 7: [8] US2003/0236583A1, “Hybrid Multi-channel/Cue Coding/Decoding of Audio Signals”