The present invention relates to the generation of a room reflection and/or reverberation related contribution of a binaural signal, the generation of a binaural signal itself, and the forming of an inter-similarity decreasing set of head-related transfer functions.
The human auditory system is able to determine the direction or directions where sounds perceived come from. To this end, the human auditory system evaluates certain differences between the sound received at the right hand ear and sound received at the left hand ear. The latter information comprises, for example, so-called inter-aural cues which may, in turn, refer to the sound signal difference between ears. Inter-aural cues are the most important means for localization. The pressure level difference between the ears, namely the inter-aural level difference (ILD) is the most important single cue for localization. When the sound arrives from the horizontal plane with a non-zero azimuth, it has a different level in each ear. The shadowed ear has a naturally suppressed sound image, compared to the unshadowed ear. Another very important property dealing with localization is the inter-aural time difference (ITD). The shadowed ear has a longer distance to the sound source, and thus gets the sound wave front later than the unshadowed ear. The meaning of ITD is emphasized in the low frequencies which do not attenuate much when reaching the shadowed ear compared to the unshadowed ear. ITD is less important at the higher frequencies because the wavelength of the sound gets closer to the distance between the ears. Hence, in other words, localization exploits the fact that sound is subject to different interactions with the head, ears, and shoulders of the listener traveling from the sound source to the left and right ear, respectively.
Problems occur when a person listens to a stereo signal that is intended for being reproduced by a loud speaker setup via headphones. It is very likely that the listener would regard the sound as unnatural, awkward, and disturbing as the listener feels that the sound source is located in the head. This phenomenon is often referred in the literature as “in-the-head” localization. Long-term listening to “in-the-head” sound may lead to listening fatigue. It occurs because the information on which the human auditory system relies, when positioning the sound sources, i.e. the inter-aural cues, is missing or ambiguous.
In order to render stereo signals, or even multi-channel signals with more than two channels for headphone reproduction, directional filters may be used in order to model these interactions. For example, the generation of a headphone output from a decoded multi-channel signal may comprise filtering each signal after decoding by means of a pair of directional filters. These filters typically model the acoustic transmission from a virtual sound source in a room to the ear canal of a listener, the so-called binaural room transfer function (BRTF). The BRTF performs time, level and spectral modifications, and model room reflections and reverberation. The directional filters may be implemented in the time or frequency domain.
However, since there are many filters necessitated, namely N×2 with N being the number of decoded channels, these directional filters are rather long, such as 20000 filter taps at 44.1 kHz, and the process of filtering is computationally demanding. Therefore, the directional filters are sometimes reduced to a minimum. The so-called head-related transfer functions (HRTFs) contain the directional information including the interaural cures. A common processing block is used to model the room reflections and reverberation. The room processing module can be a reverberation algorithm in time or frequency domain, and may operate on a one or two channel input signal obtained from the multi-channel input signal by means of a sum of the channels of the multi-channel input signal. Such a structure is, for example, described in WO 99/14983 A1. As just described, the room processing block implements room reflections and/or reverberation. Room reflections and reverberation are essential to localized sounds, especially with respect to distance and externalization—meaning sounds are perceived outside the listener's head. The aforementioned document also suggests implementing the directional filters as a set of FIR filters operating on differently delayed versions of the respective channel, so as to model the direct path from the sound source to the respective ear and distinct reflections. Moreover, in describing several measures for providing a more pleasant listening experience over a pair of headphones, this document also suggests delaying a mixture of the center channel and the front left channel, and the center channel and the front right channel, respectively, relative to a sum and a difference of the rear left and rear right channels, respectively.
However, the listening results achieved thus far still lack to a large extent a reduced spatial width of the binaural output signal and a lack of externalization. Further, it has been realized that despite the abovementioned measures for rendering multi-channel signals for headphone reproduction, portions of voice in movie dialogs and music are often perceived unnaturally reverberant and spectrally unequal.