As it is well known in the art, the International Organization for Standardization (IOS) founded the Moving Pictures Expert Group (MPEG) with the intention to develop and standardize compression algorithms for video and audio signals. Among several existing multichannel audio compression alogrithms, MPEG-2 Advanced Audio Coding (AAC) is currently the most powerful one in the MPEG family, which supports up to 48 audio channels and perceptually lossless audio at 64 kbits/s per channel. One of the driving forces to develop the AAC algorithm has been the quest for an efficient coding method for surround sound signals, such as 5-channel signals including left (L), right (R), center (C), left-surround (LS) and right-surround (RS) signals, as shown in FIG. 1. Additionally, an optional low-frequency enhancement (LFE) channel is also used.
Generally, an N-channel surround sound system, running with a bit rate of M bps/ch, does not necessarily have a total bit rate of M×N bps, but rather the overall bit rate drops significantly below M×N bps due to cross channel (inter-channel) redundancy. To exploit the inter-channel redundancy, two methods have been used in MPEG-2 AAC standards: Mid-Side (MS) Stereo Coding and Intensity Stereo Coding/Coupling. Coupling is adopted based on psychoacoustic evidence that at high frequencies (above approximately 2 kHz), the human auditory system localizes sound based primarily on the “envelopes” of critical-band-filtered versions of the signals reaching the ears, rather than the signals themselves. MS stereo coding encodes the sum and the difference of the signal in two symmetric channels instead of the original signals in left and the right channels.
Both the MS Stereo and Intensity Stereo coding methods operate on Channel-Pairs Elements (CPEs), as shown in FIG. 1. As shown in FIG. 1, the signals in channel pairs are denoted by (100L, 100R) and (100LS, 100RS). The rationale behind the application of stereo audio coding is based on the fact that the human auditory system, as well as a stereo recording system, uses two audio signal detectors. While a human being has two ears, a stereo recording system has two microphones. With these two audio signal detectors, the human auditory system or the stereo recording system receives and records an audio signal from the same source twice, once through each audio signal detector. The two sets of recorded data of the audio signal from the same source contain time and signal level differences caused mainly by the positions of the detectors in relation to the source.
It is believed that the human auditory system itself is able to detect and discard the inter-channel redundancy, thereby avoiding extra processing. At low frequencies, the human auditory system locates sound sources mainly based on the inter-aural time difference (ITD) of the arrived signals. At high frequencies, the difference in signal strength or intensity level at both ears, or inter-aural level difference (ILD), is the major cue. In order to remove the redundancy in the received signals in a stereo sound system, the psychoacoustic model analyzes the received signals with consecutive time blocks and determines for each block the spectral components of the received audio signal in the frequency domain in order to remove certain spectral components, thereby mimicking the masking properties of the human auditory system. Like any perceptual audio coder, the MPEG audio coder does not attempt to retain the input signal exactly after encoding and decoding, rather its goal is to reduce the amount of audio data yet maintaining the output signals similar to what the human auditory system might perceive. Thus, the MS Stereo coding technique applies a matrix to the signals of the (L, R) or (LS, RS) pair in order to compute the sum and difference of the two original signals, dealing mainly with the spectral image at the mid-frequency range. Intensity Stereo coding replaces the left and the right signals by a single representative signal plus directional information.
While conventional audio coding techniques can reduce a significant amount of channel redundancy in channel pairs (L/R or LS/RS) based on the dual channel correlation, they may not be efficient in coding audio signals when a large number of channels are used in a surround sound system.
It is advantageous and desirable to provide a more efficient encoding system and method in order to further reduce the redundancy in the stereo sound signals. In particular, the method can be advantageously applied to a surround sound system having a large number of sound channels (6 or more, for example). Such system and method can also be used in audio streaming over Internet Protocol (IP) for personal computer (PC) users, mobile IP and third-generation (3G) systems for mobile laptop users, digital radio, digital television, and digital archives of movie sound tracks and the like.