Over the past decades there has been great progress in the field of virtual reality technology, in particular as regards the visual virtual reality. 3D TV screens have found their way to the general public, and especially the home theaters and video games take advantage hereof. But 3D sound technology still lags behind. Yet, it is—at least in theory—quite easy to create a virtual 3D acoustic environment, called Virtual Auditory Space (VAS). When humans localize sound in 3D space, they use two audio signals picked up by the left and right ear. An important cue hereby is the so called “interaural time difference” (ITD): depending on the direction of the sound (w.r.t. the persons head), the sound will first reach the left or the right ear, and this time difference contains information about the lateral angle ϕ (see FIG. 2). The interaural time difference function (ITDF) describes how the ITD varies with the direction of the sound source (e.g. loudspeaker), see FIG. 3 for an example.
Other cues are contained in the spectral content of the sound as it is registered by the inner ear. After all, before the sound waves coming from a certain direction reach the tympanic membrane, they interfere with the body, the head and the pinna. And by this interference some frequencies are more easily transmitted than others; consequently, there occurs a spectral filtering which is dependent on the direction from where the sound is coming. This filtering is described by the so-called “Head-Related Transfer Function” (HRTF), (see example in FIG. 4) which for each direction of the sound source describes the proportion of each frequency that is transmitted or filtered out. The spectral content of the signals received in both ears thus contains additional information (called: spectral cues) about the location of the sound source, and especially about the elevation φ (see FIG. 2), the height at which the sound-source is located relative to the head, but also whether the sound source is located in front of, or behind the person.
To create a realistic 3D acoustic virtual reality, it is therefore paramount to know the ITDF and HRTF of a particular person. When these are known, suitable time delays and spectral filtering can be added artificially for any specific direction, and in this way, the listener is given the necessary cues (time cues and spectral cues) to reconstruct the 3D world.
Currently, there are already a lot of applications on the market that use the HRTF to create a virtual 3D impression, but so far they are not widely used. After all, they make use of a single, generalized ITDF and HRTF set, which is supposed to work for a wide audience. Just as with 3D-vision systems where it is assumed that the distance between the eyes is the same for everyone, these systems make use of the average ITDF and HRTFs. While this does not pose significant problems for vision, it does for 3D-audio. When for an individual, the distance between the eyes is significantly different from the average distance, it may occur that the users depth perception is not optimal, causing the feeling that “something is wrong”, but the problems related to 3D-audio are much more severe. Small differences may cause large errors. Equipped with virtual “average ears”, the user experiences effectively a spatial effect—the sound is no longer inside the head—, but somewhere outside the head, but there is often much confusion about the direction where the sound is coming from. Most mistakes are made in the perception of the elevation, but also, and this is much more disturbing: front and rear are often interchanged. Sound that should actually come from the front, is perceived as coming from behind, significantly lowering the usefulness of this technology.
Hence, despite the fact that the HRTF and ITDF of different people are similar, even small differences between a person's true HRTF and ITDF and the general HRTF and ITDF cause errors which, in contrast to 3D-vision, are detrimental to the spatial experience. This is probably one of the reasons why VAS through stereo headphones hasn't realized its full potential yet. Hence, to make optimal use of the technology, it is necessary to use a personalized HRTF and ITDF. But how to achieve this on a large scale, so that this technology can be made available to the general public?
The HRTF and ITDF of a person are traditionally recorded using specialized infrastructure: in an anechoic chamber, in which a sound source is moved around the subject, and for each sampled direction the corresponding signal arriving at the left and right ear is recorded by means of microphones which are arranged in the left and right ear of the subject, just at the entrance of the ear canal. Although in recent years progress has been made and new methods have been developed to simplify this procedure, such measurement remain very cumbersome and expensive. It is therefore not possible to measure the HRTF and ITDF of all potential users in this way. Therefore, there is a need to look for other ways to individualize the HRTF and ITDF.
U.S. Pat. No. 5,729,612A describes a method and apparatus for measuring a head-related transfer function, outside of an anechoic chamber. In this document it is proposed to measure the HRTF using a sound wave output by a loudspeaker mounted on a special support. A left and right audio signal is captured by two in-ear microphones worn by a subject whose head movements are tracked by a position sensor and/or who is sitting on a chair which can be oriented in particular directions. The data will be processed in a remote computer. The document is silent about how exactly the ITDF and HRTF are calculated from the measured audio signals and position signals. However, a calibration step is used to determine a transfer characteristic of the loudspeaker and microphones, and the method also relies heavily on the fact that the relative position of the person and the loudspeaker are exactly known.
There is still room for improvement or alternatives.