1. Field of the Invention
One or more embodiments of the present invention relate to audio decoding, and more particularly, in an embodiment, to moving picture experts group (MPEG) surround audio decoding capable of decoding binaural signals from encoded multi-channel signals using sound localization.
2. Description of the Related Art
In conventional signal processing techniques for generating binaural sounds from encoded multi-channel signals, an operation of reconstructing the multi-channel signals from the input encoded signal is performed first, followed by an operation of transforming the multi-channel signal into the frequency domain and separately up-mixing each reconstructed multi-channel signal to 2-channel signals for output by binaural processing using head related transfer functions (HRTFs). These two operations are separately performed, and are also complex, resulting in it being difficult to generate signals in devices having limited hardware resources, such as mobile audio devices.
Here, the encoded multi-channel signals are obtained by an encoder compressing the original multi-channel signals into a corresponding encoded mono or stereo signal by using respective spatial cues for the different multi-channel signals, and corresponding spatial cues are used by the decoder to decode the encoded mono or stereo signal into the decoded multi-channel signals. This encoding from the multi-channel signals to the encoded mono or stereo signal using respective spatial cues is considered a “down-mixing” of the multi-channel signals, as the different signals are mixed together to generate the encoded mono or stereo signal. This down-mixing is performed in a series of staged down-mixing modules, with corresponding spatial cues being used at each down-mixing module. Similarly, in the decoding side, a received encoded mono or stereo signal can be separated or un-mixed into respective multi-channel signals. This un-mixing is considered an “up-mixing”, and is accomplished through a series of staged up-mixing modules that up-mix the signals using respective spatial cues to eventually output the resultant decoded multi-channel signals. As noted, above, when generating binaural sounds from these decoded multi-channel signals, an additional operation is performed using the aforementioned HRTFs.
As an example, FIG. 1 illustrates such a conventional operation for generating 2-channel binaural signals from decoded multi-channel signals.
Here, in order to output multi-channel signals as 2-channel binaural signals, such operations will now be briefly explained with a system of the illustrated multi-channel encoder 102, multi-channel decoder 104, and binaural processing device 106.
Thus, in this representative example, the multi-channel encoder 102 compresses the input multi-channel signals into a mono or stereo signal, i.e., through the above mentioned staged down-mixing modules, and then, the multi-channel decoder 104 may receive the resultant mono or stereo signal as an input signal. The multi-channel decoder 104 reconstructs multi-channel signals from the input signal by using the aforementioned spatial cues in a quadrature mirror filter (QMF) domain and then transforms resultant reconstructed multi-channel signals into time-domain signals. The QMF domain represents a domain including signals obtained by dividing time-domain signals according to frequency bands. The binaural processing device 106 then transform the decoded multi-channel signals transformed into the time-domain signals into frequency-domain multi-channel signals, and then up-mixes the transformed multi-channel signals to 2-channel binaural signals using HRTFs. Thereafter, the up-mixed 2-channel binaural signals are respectively transformed into time-domain signals. As described above, in order to output an encoded input signal as the 2-channel binaural signals, the separate sequential operations of reconstructing the multi-channel signals from the input signal in the multi-channel decoder 104, and transforming the multi-channel signal into the frequency domain and separately up-mixes each reconstructed multi-channel signal into the 2-channel binaural signals are required. Here, these operations are separate because they must be performed in separate domains.
However, as noted above, in such conventional systems, there are problems in that, firstly, due to the required two processing operations, decoding complexity is increased. Secondly, since the binaural processing device 106 must additionally operate in the frequency domain, the transforming of the reconstructed multi-channel signals into the frequency-domain is required. Lastly, in order to further up-mix the reconstructed multi-channel signals to generate the two binaural channels, through binaural processing, typically a designated chip for performing such a binaural processing device is required.