1. Field of the Invention
One or more embodiments of the present invention relate to a method, medium, and system encoding and/or decoding a multi-channel audio signal, and more particularly, to a method, medium, and system encoding and/or decoding a multi-channel audio signal by using spatial cues generated using direction information of a plurality of channels, and a decoding method, medium, and system for outputting a 2-channel signal from a mono signal down-mixed from multi-channels.
2. Description of the Related Art
According to conventional techniques for encoding and/or decoding a multi-channel audio signal, multi-channel audio signals are encoded and/or decoded based on that fact that a spatial effect that can be felt by a person is mainly caused by binaural influences, resulting in the positions of specific sound sources being recognizable by using interaural level differences (ILD) and interaural time differences (ITD) of sounds arriving at the respective ears of the person. Thus, according to the conventional techniques, when a multi-channel audio signal is encoded, the multi-channel audio signal is generally down-mixed to a mono signal, and information regarding the encoded/down-mixed channels is expressed by spatial cues of an inter-channel level differences (ICLDs) and inter-channel time differences (ICTDs). Thereafter, the down-mixed/encoded multi-channel audio signal can be decoded using the spatial cues of the ICLDs and ICTDs. Here, the term down-mixed corresponds to a staged mixing of separate input multi-channel signals during encoding, where separate input channel signals are mixed to generate a single down-mixed signal, for example. Through the staging of such down-mixing modules all multi-channel signals may be down-mixed to such a single mono signal. Similarly, such a down-mixed mono signal can be decoded through a staging of up-mixing modules to perform a series of up-mixing of signals until all multi-channel signals are decoded. Here, respective ICLDs and ICTDs generated during each down-mixing in the encoder, through a tree structure of down-mixing modules, can be used by a decoder in a similar mirroring of up-mixing modules to un-mix the down-mixed mono signal.
However, in such an implementation of ICLDs, recognition of the position of a sound source using a ICLD is possible only in a high frequency region where the wavelength of sound is less than the diameter of the head of a listener, resulting in accuracy being degraded in regions of low frequencies. Conversely, in the case of the ICTDs, recognition of the position of a sound source is possible only in a low frequency region where the wavelength of sound is greater than the diameter of the head of the listener, resulting in accuracy being degraded in regions of higher frequencies. Thus, if any, position recognition is frequency dependent.
Meanwhile, in such techniques, in order to further generate a 2-channel virtual stereo sound from the down-mixed mono signal, the mono signal is restored to the multi-channel signals by using the ICLD and ICTD spatial cues, and then the restored multi-channel signals are synthesized into to 2 channels based on head related transfer functions (HRTFs). A HRTF expresses an acoustic process in which sound from a sound source localized in a free space is transferred to the ears of a listener, and includes important information with which the listener determines the position of a sound source. Thus, the HRTFs include much information indicating the characteristics of the space through which sound is transferred, as well as information on the ICTDs, ICLDs, and shapes of earlobes, for example.
In order to synthesize the multi-channel signal into the 2-channel signal using the HRTFs, respective HRTFs corresponding the left ear and the right ear for each channel of the multi-channels are required, resulting in the number of required HRTFs being double the number of the multi-channels. For example, in order to output a 2-channel signal from a 5.1-channel signal, a total of 10 HRTFs are required. HRTFs are conventionally stored in an HRTF database in a decoding system. Accordingly, in order to store many HRTFs in such a database large storage capacities for the database are required.