Embodiments of the present invention relate to a device for determining “room-optimized transfer functions” for a listening room, to a corresponding method and to a device for spatially reproducing an audio signal using corresponding methods. In accordance with preferred embodiments, reproduction takes place by means of a binaural close-range sound transducer, such as, for example, by means of a stereo headset or stereo in-ear earphones. Further embodiments relate to a system comprising the two devices, and to a computer method for performing the methods mentioned.
The perceptive quality when presenting a spatial auditory scene, for example on the basis of a multi-channel audio signal, is decisively dependent on the acoustic artistic design of the contents of the presentation, on the reproduction system and on the room acoustics of the listening room or room. A main goal when developing audio reproduction systems is producing auditory events which are estimated by the listener as being plausible. This plays an important role when reproducing image-sound contents, for example. With contents perceived by the user as being plausible, various perceptual quality features, such as, for example, localizability, perception of distance, perception of spatiality and sound aspects of the reproduction, have to meet the expectations. In the ideal case, the perception of the situation reproduced coincides with the real situation in the room.
In loudspeaker-based audio reproduction systems, two-channel or multi-channel audio material is reproduced in a listening room. This audio material may originate from a channel-based mixture where the finished loudspeaker signals are already present. In addition, the loudspeaker signals may also be generated by an object-based sound reproduction method. The loudspeaker reproduction signals are generated based on a description of a tonal object (for example position, volume etc.) and knowing the prevailing loudspeaker setup. Thus, phantom sound sources which usually are located on the connection axes between the loudspeakers are generated. Depending on the loudspeaker setup chosen and the prevailing room acoustics of the listening room, these phantom sound sources may be perceived by the listener in different directions and distances. The room acoustics here has a decisive influence on the harmony of the auditory scene reproduced.
Reproduction via loudspeaker signals, however, is not practical in every listening situation. In addition, it is not possible to install loudspeakers anywhere. Examples of such situations may be listening to music on mobile terminals, usage in changing rooms, user acceptance or acoustic molestation of others. Close-range sound transducers, like in-ears or headsets, which are “worn” directly at or in direct proximity to the ear, are frequently used as an alternative for loudspeakers.
Classical stereo reproduction using sound transducers which are, for example, equipped with an acoustic driver for each side or ear each, produce a perception in the listener of the reproducing phantom sound sources to be located in the head on the connection axis between the two ears. This is referred to as the so-called “in-head localization”. An external perception of plausible effect (externicity) of the phantom sound sources, however, does not take place. The phantom sound sources produced in this way usually neither comprise a direction (information) decodable for a user nor distance (information) which would, for example, be present when reproducing the same acoustic scene via a loudspeaker system (for example 2.0 or 5.1) in the listening room.
In order to bypass in-head localization when reproducing using headsets, methods of binaural synthesis are used (without losing any of the artistic design and mixture in the audio material). In binaural synthesis, so-called “outer ear transfer functions” (or head-related transfer function, HRTF) are used for the left and right ears. These head-related transfer functions comprise, for each ear, a plurality of respective directional vectors for head-related transfer functions associated to virtual sound sources, in accordance with which the audio signals are filtered when reproducing same, so that an auditory scene is represented spatially or spatiality is emulated. Binaural synthesis makes use of the fact that interaural features are decisively responsive for the development of perceiving the direction of a sound source, wherein these interaural features are represented in the head-related transfer functions. When an audio signal is to be perceived from a defined direction, this signal is filtered using the HRTFs of the left or right ear, belonging to this direction. Using binaural synthesis, it is thus possible to reproduce both a realistic surround sound scene, for example stored as multi-channel audio, via the headset. In order to virtually simulate a loudspeaker setup, the HRTF pairs, bound to a direction, are used for each loudspeaker to be simulated. For a plausible representation of direction and distance of the loudspeaker setup, additionally the direction-dependent acoustic transfer functions of the listening room (room-related transfer functions, RRTFs) also have to be emulated. These are then combined with the HRTFs and result in binaural room impulse responses (BRIRs). The BRIRs may be applied to the acoustic signal as filters.
However, late research and examinations dearly reveal that the plausibility of an audio reproduction, apart from the physically correct synthesis of the reproduction signals, is also determined decisively by context-dependent quality parameters and, in particular, on the horizon of expectations of the user as regards room acoustics. Therefore, there is need for an improved approach in binaural synthesis.
It is the object of the present invention to provide improved spatial reproduction by means of close-range sound transducers, in particular for making acoustics synthesizing and the horizon of expectations of the consumer coincide.