1. Field of the Invention
One or more embodiments of the present invention relate to audio decoding, and more particularly, to moving picture experts group (MPEG) surround audio decoding capable of down-mixing multi-channel signals to 2-channel binaural signals based on channel level differences (CLDs) and head related transfer functions (HRTFs) applied to the multi-channel signals.
2. Description of the Related Art
In conventional signal processing techniques for outputting multi-channel signals as binaural sounds, an operation of reconstructing multi-channel signals from an input signal obtained by compressing multi-channel signals into the mono or stereo signal by using spatial cues is performed. Separately, an operation of down-mixing the reconstructed multi-channel signals to 2-channel signals by binaural processing using head related transfer functions (HRTFs) is thereafter performed. As will be explained in greater detail below, such HRTFs model a sonic process of transferring a sound source localized in free space to a person's ears, and include important information for detecting the position of the sound source from the perspective of the person. Here, such separate operations of reconstructing the multi-channel signals and the down-mixing of the reconstructed multi-channel signals using head related transfer functions are complex, and it becomes difficult to generate signals in a device having limited hardware resources, such as mobile audio devices.
FIG. 1 illustrates a conventional overall system of an encoder, transmission/storage, and decoder outputting input decompressed multi-channel signals as 2-channel binaural signals.
Referring to FIG. 1, in order to output multi-channel signals as 2-channel binaural signals, the overall system includes a multi-channel encoder 102, a multi-channel decoder 104, and a binaural processing device 106.
Initially, the multi-channel encoder 102 compresses the input multi-channel signals into a mono or stereo signal, which may be considered a ‘down-mixing’ of the multi-channel signals. The multi-channel decoder 104 then receives such a mono or stereo input signal. The multi-channel decoder 104 then reconstructs multi-channel signals from the input signal in a quadrature mirror filter (QMF) domain by using spatial cues and transforms the reconstructed multi-channel signals into time-domain signals, which may be considered an ‘up-mixing’ of the received mono or stereo signal. The spatial cues may include correlations/differences between channels, e.g., correlations/differences between left and right channels such that a minimal amount of data for both channels can be sent as a single signal along with the spatial cues. Such spatial cues may also be sent/input with the input signal and can equally be used for multi-channel arrangements. In another way to minimize data, the QMF domain represents the domain wherein the input time-domain signal has been divided into multiple signals within different respective frequency bands. The different frequency bands permit compression/decompression of audio information to remove audio information within each frequency band that would not be audible or heard by a person due to that audio information being weaker than a stronger audio information in the same frequency band.
Referring back to FIG. 1, the binaural processing device 106 thereafter transforms the time-domain multi-channel signals into frequency-domain multi-channel signals and down-mixes the transformed multi-channel signals to the 2-channel binaural signals using the aforementioned head related transfer functions (HRTFs). Thereafter, the down-mixed 2-channel binaural signals are transformed into time-domain signals, respectively. As described above, in order to output the input signal, obtained by compressing the multi-channel signals into the mono or stereo signal, as the 2-channel binaural signals, both the operation of reconstructing the multi-channel signals from the input signal in the multi-channel decoder 104 and the operation of down-mixing the reconstructed multi-channels to the 2-channel binaural signals are required.
As described above, in this conventional case, there are problems in that, firstly, two processing operations are required. Therefore, decoding complexity increases. Secondly, in order to reconstruct the multi-channel signals from the input signal obtained by compressing the multi-channel signals into the mono or stereo signal, the operation performed in the QMF domain has to be performed for each channel. Therefore, many operations are required. Lastly, in order to thereafter down-mix the reconstructed multi-channel signals to the 2-channel binaural signals, through the binaural processing, a dedicated binaural processing processor is typically required.