Human beings detect and localize sound sources in three-dimensional space by means of the human binaural sound localization capability.
The input to the hearing consists of two signals: sound pressures at each of the eardrums. These two sound signals are called binaural sound signals. The term binaural refers to the fact that a set of two signals form the input to the hearing. It is not fully known how the hearing extracts information about distance and direction to a sound source, but it is known that the hearing uses a number of cues in this determination. Among the cues are coloration, interaural time differences, interaural phase differences and interaural level differences. Thorough descriptions of cues to directional hearing are given by J. Blauert: "Raumliches Horen", Hirzel Verlag, Stuttgart, Germany, 1974, and "Spatial Hearing", The MIT Press, Cambridge, Mass., 1983.
This means that if the sound pressures at the eardrums are created exactly as they would have been created by a given spatial sound field, a listener would not be able to distinguish this sound experience from the one he would get from being exposed to the spatial sound field itself.
One known way of approaching this ideal sound reproducing situation is by the artificial head recording technique. An artificial head is a model of a human head where the geometries of a human being which are acoustically relevant especially with respect to diffraction around the body, shoulder, head and ears are modelled as closely as possible. During a recording, e.g. of a concert, two microphones are positioned in the ear canals of the artificial head to sense sound pressures, and the electrical output signals from these microphones are recorded.
When these signals are reproduced, e.g. by headphones, the sound pressures in the ear canals of the artificial head during the concert are reproduced in the ear canals of the listener and the listener will achieve the perception that he was listening to the concert in the concert hall. The signals for the headphones are also called binaural signals.
The term binaural signals designates a set of two signals, left and right, having been coded using transmission characteristics corresponding to the transmission to the two ears of the human listener, for instance to be presented in the left and right ear canals, respectively, of a listener.
The binaural signals may typically be electrical signals, but they may also be, e.g. optical signals, electromagnetic signals or any other type of signal which can be transformed, directly or indirectly, into sound signals in the left and right ears of a human.
The transmission of a sound wave propagating from a sound source positioned at a give n direction and distance in relation to the left and right ears of the listener is described in terms of two transfer functions, one for the left ear and one for the right ear, that include any linear distortion, such as coloration, interaural time differences and interaural spectral differences. These transfer functions change with direction and distance of the sound source in relation to the ears of the listener. It is possible to measure the transfer functions for any direction and distance and simulate the transfer functions, e.g. electronically, e.g. by filters. If such filters are inserted in the signal path between a playback unit such as a tape recorder and headphones used by a listener, the listener will achieve the perception that the sounds generated by the headphones originate from a sound source positioned at the distance and in the direction as defined by the transfer functions of the filters, because of the true reproduction of the sound pressures in the ears.
A set of two such transfer functions, one for the left ear and one for the right ear, is called a Head-related Transfer Function (HTF). Each transfer function is defined as the ratio between a sound pressure p generated by a plane wave at a specific point in or close to the appertaining ear canal (p.sub.L in the left ear canal and p.sub.R in the right ear canal) in relation to a reference. The reference traditionally chosen is the sound pressure P.sub.1 generated by a plane wave at a position right in the middle of the head, but with the listener absent. In the frequency domain this HTF is given by: EQU H.sub.L =P.sub.L /P.sub.1, H.sub.R =P.sub.R /P.sub.1 (1)
where L designates the left ear and R designates the right ear. The time domain representation or description of the HTF, that is the inverse Fourier transform of the HTF, is often called the Head-related Impulse Response (HIR). Thus, the time domain description of the HTF is a set of two impulse responses, one for the left ear and one for the right ear, each of which is the inverse Fourier transform of the corresponding transfer function of the set of two transfer functions of the HTF in the frequency domain.
The HTF depends upon the angle of incidence of the plane wave in relation to the listener. It gives a complete description of the sound transmission to the ears of the listener, including diffraction around the head, reflections from shoulders, reflections in the ear canal, etc.
The definitions given in equation (1) were given by J. Blauert: "Raumliches Horen", Hirzel Verlag, Stuttgart, Germany, 1974.
A tutorial about binaural techniques is given by Henrik M.o slashed.ller: "Fundamentals of Binaural Technology", Applied Acoustics No. 3/4, pp. 171-218, vol. 36, 1992.
As mentioned above, binaural signals may be generated using the artificial head recording and reproducing technique; the artificial head could be substituted with a test person.
Alternatively, binaural signals may be generated by any means that simulate the transmission of sound to the ear canals of humans, such as analog filters, digital filters, signal processors, computers, etc.
U.S. Pat. No. 3,920,904 discloses a method for creating sound pressures at the eardrums of a listener by means of headphones, that correspond to sound pressures which would be created at the eardrums of the listener in a predetermined acoustical environment in response to electrical signals applied to a number of loudspeakers, comprising measurement of the HTFs corresponding to the positioning of the loudspeakers in relation to the listener and simulation of the HTFs with analog electronic filters.
It has also been claimed to be possible to design the simulating filters using a different approach that does not include a measurement of HTFs but relies on knowledge of specific cues to directional hearing. Such an approach is disclosed in U.S. Pat. No. 4,817,149, where a front/back cue is generated by a spectral bias, elevation by a notch filter, and azimuth by a time-shift between the two channels.