One technique using HRTF type transfer functions is binaural synthesis. This technique is based on the use of “so-called” “binaural” filters that reproduce acoustic transfer functions between the sound source(s) and the listener's ear canals. These filters are used to simulate auditive positioning indexes that a listener uses to position sound sources in a real listening situation.
Therefore techniques related to binaural synthesis are based on a pair of binaural signals that are input to a reproduction system. The two binaural systems can be obtained by signal processing, by filtering a monophonic signal by binaural filters that reproduce acoustic propagation properties between the source placed at a given position and each of the listener's ear canals.
Binaural synthesis can be used for different reproductions for example such as reproduction using a headset with two ear phones, or using two loud speakers. The objective is to reconstruct a sound field at the ears of the listener that is practically identical to the sound field that the real sources would have induced in space.
Binaural filters take account of all acoustic phenomena that modify the acoustic waves along their path between the source and the listener's ear canals. In particular, acoustic phenomena include diffraction through the listener's head and reflections on the user's pinna and upper torso.
These acoustic phenomena vary depending on the position of the sound source relative to the listener and the listener can position the source in space through the variations. These variations determine a form of acoustic coding of the position of the source. Through learning, an individual's hearing system can interpret this coding to position the sound source(s).
Nevertheless, acoustic diffraction/reflection phenomena are strongly dependent on the listener's morphology. Therefore a quality binaural synthesis depends on binaural filters that optimize reproduction of the acoustic coding that the listener's body produces naturally, taking account of individual specific features of his or her morphology.
When these conditions are not respected, a degradation of the performances of the binaural rendering is induced, which in particular results in intercranial perception of sources and confusion between forward and rear positions.
Thus, binaural filters represent acoustic transfer functions or HRTF transfer functions that model transformations generated by the user's torso, head and pinna on the acoustic signal originating from a sound source. A pair of HRTF functions is associated with each sound source position, with one for each ear. Furthermore, these HRTF transfer functions carry the acoustic footprint of the morphology of the individual on which they were measured.
In a well-known manner, HRTF transfer functions are obtained during a measurement phase. A selection of directions is fixed more or less precisely covering the entire space surrounding the listener. For each direction, the left and right HRTF transfer functions are measured using microphones inserted at the entry to the listener's ear canals. In general, a sphere centered on the listener is thus defined.
For a good quality measurement, the measurement must be made in an anechoic chamber or soundproof booth, such that only acoustic reflections and phenomena related to the listener are taken into account. Finally, if M directions are measured, the result obtained for a given listener is a database of 2M HRTF type transfer functions (for two auditory channels, right and left) representing each of the source positions for each ear canal. Therefore, these techniques necessitate measurements made directly on the listener. Such a measurement operation takes a very long time because a large number of directions have to be measured.
Thus, some individuals spend many hours in the laboratory to analyze details of the acoustic signature associated with their physiognomy, and their perception capacities of the sound space in three dimensions. These individuals then benefit from binaural listening shaped from the analysis results, providing comfort a high quality sound impression.
Filters personalized to each listener are necessary if this quality and this comfort are to be made available to a larger group of listeners, particularly for services aimed at the general public.
However it is difficult to image that all customers of a service could be measured in soundproof booths (that are rare and expensive). Furthermore, the general public would find it difficult to accept the duration and the discomfort of these measurements.
It is thus desirable to have solutions capable of quickly, reliably and unintrusively providing individual acoustic signatures so that the results obtained in an anechoic chamber on a small number of persons could be generalized to a very large population.
One practical solution that is starting to emerge is to suggest that the user could measure his own HRTF transfer functions in his normal place of listening so as to emulate his listening experience in a studio or in his living room, on headphones. The disadvantages related to this type of solution are related to the fact that only a small number of fixed positions are measured and it becomes difficult to separate information related to the reproduction device itself and the place of listening. Different studies have been dedicated to the production of methods to reduce some practical constraints such as dynamic measurement “Dynamic measurement of room impulse responses using a moving microphone”, Ajdler, Sbaiz, Vetterli, 2007) or reciprocal measurement, in which the roles of the microphone and the load speaker are inverted (“Fast head-related transfer function measurement via reciprocity”, Zotkin, Duraiswami, Grassi, Gumerov, 2006). Applications of this solution are limited to professional mixing studios or “home cinema” installations.
Different possibilities offering alternative solutions are explored. A first approach consists of calculating filters starting from acquisition of the listener's morphology and in particular his pinna. Personalization can also be based on the transformation of non-individual HRTF transfer functions extracted from a database including morphologies associated with HRTF transfer functions (“Individualization des indices spectraux pour la synthèse binaurale: recherche et exploitation des similarites interindividuelles pour l'adaptation ou la reconstruction de HRTF (Individualization of spectral indices for binaural synthesis; search for and use of interindividual similarities for adaptation or reconstruction of HRTF)”, Guillon, P, PhD Thesis, University of Maine, Le Mans, France, 2009).
The transformation of HRTF transfer functions to adapt them to a given individual is then controlled by the comparison of morphologies of the pinna taken from the database and the target pinna of the given individual. This comparison is based on a technique for matching of three-dimensional meshes of pinnas. Another method consists of using morphological parameters to create or deform a three-dimensional mesh that will then be used for a detailed calculation and a digital simulation of the individual's HRTF transfer functions, for example by boundary finite elements. It is also possible, starting from morphological parameters of a given individual, to search in a database for a third party individual with similar morphological parameters.
Some approaches propose to use a three-dimensional model of the patient's morphology and more particularly his pinna, and other measurements of users' morphological parameters, as input. One method of acquiring the morphology of the pinna consists of using a three-dimensional scan, but this method is sometimes problematic in that it requires special equipment and also special skills.
Alternative solutions are developed either by deriving three-dimensional scans from a set of photographs (“Reconstructing head models from photographs for individualized 3D-audio processing”, Dellepiane, Pietroni, Tsingos, Asselot, Scopigno, 2008), or by using methods derived from image processing to obtain three-dimensional meshes starting from a camera and reconstruction techniques (“shape from shading”, “shape from structured light”) or from Kinect™ type sensors associated with depth analysis techniques.
Other work attempts to develop learning methods that include two opposing approaches.
The first approach consists of studying the capacity of listeners to acquire generic HRTF transfer functions that were not initially adapted for them. On the contrary, the second approach suggests computer learning of the reactions of a user participating in an interactive game or answering an interactive questionnaire. The computer iteratively reconstitutes the set of HRTF transfer functions suitable for the user by observing his positioning performances and/or his replies.
However, the storage of sets of transfer functions and their transmission and loading are complicated because of the amount of data representing each set of transfer functions.
Furthermore, solutions necessary for the personalization of a set of transfer functions to adapt it to a given listener do not yet exist, apart from measurements in a soundproof booth. As explained above, measurements in soundproof booths are complex and expensive in hardware and software resources and in time, and thus cannot be transposed to a large population.