The invention relates to sound spatialization, known as 3D-rendered sound, of audio signals, integrating in particular a room effect, notably in the field of binaural techniques.
Thus, the term “binaural” is aimed at the reproduction on a pair of stereophonic headphones, or a pair of earpieces, of an audio signal but still with spatialization effects. The invention is not however limited to the aforementioned technique and is notably applicable to techniques derived from the “binaural” techniques, such as the “transaural” reproduction techniques, in other words on remote loudspeakers. TRANSAURAL® is a commercial trademark of the company COOPER BAUCK CORPORATION.
One specific application of the invention is, for example, the enrichment of audio contents by effectively applying acoustic transfer functions of the head of a listener to monophonic signals, in order to immerse the latter in a 3D sound scene, in particular including a room effect.
For the implementation of “binaural” techniques on headphones or loudspeakers, the transfer function, or filter, is defined for a sound signal between a position of a sound source in space and the two ears of a listener. The aforementioned acoustic transfer function of the head is denoted HRTF, for “Head-Related Transfer Function”, in its frequency form and HRIR, for “Head-Related Impulse Response”, in its temporal form. For one direction in space, two HRTFs are ultimately obtained: one for the right ear and one for the left ear.
In particular, the binaural technique consists of applying such acoustic transfer functions for the head to monophonic audio signals, in order to obtain a stereophonic signal which, when listened to on a pair of headphones, provides the listener with the sensation that the sound sources originate from a particular direction in space. The signal for the right ear is obtained by filtering the monophonic signal by the HRTF of the right ear and the signal for the left ear is obtained by filtering this same monophonic signal by the HRTF of the left ear.
The essential physical parameters that allow these transfer functions to be characterized are:                the ITD, for “Interaural Time Difference”, defined as the interaural arrival time difference of the sound waves from the same sound source between the left ear and the right ear of the listener. The ITD is principally linked to the phase of the HRTFs;        the spectral modulus, which notably allows level differences to be perceived between the left ear and the right ear as a function of frequency;        when the HRTF, or the HRIR, of the head of the listener are not considered as corresponding to conditions of free field sound propagation (anechoic condition), the aforementioned transfer functions can take into account reflection, scattering and diffraction phenomena which correspond to the acoustic response of the room in which these transfer functions have been measured or simulated. The aforementioned transfer functions are then called BRIR, for “Binaural Room Impulse Response”, in their temporal form.        
The aforementioned binaural techniques may for example be employed in order to simulate a 3D rendering of the 5.1 type on the pair of headphones. In this technique, to each loudspeaker position of the multi-speaker, or “surround”, system corresponds an HRTF pair, one HRTF for the left ear and one HRTF for the right ear. The sum of the 5 channels of the signal in 5.1 mode, convoluted by the 5 HRTF filters for each ear of a listener, allows two binaural channels, right and left, to be obtained, which simulate the 5.1 mode for listening on a pair of audio headphones.
In this situation, binaural spatialization simulating a multi-speaker system is referred to as “binaural virtual surround”.
In the 3D rendering, when the fact of the listener perceiving the sound sources at variable distances away from his head, a phenomenon known by the term ‘externalization’, is taken into account, and in a manner that is independent from the direction or origin of the sound sources, it frequently happens, in a binaural 3D rendering, that the sources are perceived to be inside the head of the listener. The source thus perceived is referred to as ‘non-externalized’.
Various studies have shown that the addition of a room effect in the binaural 3D rendering methods allows the externalization of the sound sources to be considerably enhanced. Cf., notably, D. R. Begault and E. M. Wenzel, “Direct comparison of the impact of head tracking, reverberation and individualized head-related transfer functions on the spatial perception of a virtual speech source”, J. Audio Eng. Soc., Vol. 49, No. 10, 2001.
Currently, there are two main methods allowing the room effect to be integrated into the HRIR:                the first, relating to the real room effect, consists of measuring HRIRs in a non-anechoic room, therefore comprising a room effect. The HRIRs obtained, which are actually the BRIRs, must be of sufficiently long duration in order to integrate the first sound reflections, a duration longer than 500 time samples for a sampling frequency of 44,100 Hz, but this duration must be even longer, in other words longer than 20,000 time samples at the same sampling frequency, if it is desired to integrate the delayed reverberation effect. It is however noted that the aforementioned BRIRs may be obtained in an equivalent manner by the convolution of the HRIRs measured in an anechoic environment with the desired room effect, represented by the pulse response of the room;        the second, relating to the artificial room effect, comes from virtual acoustics and consists of synthetically integrating the room effect into the HRIR. This operation is carried out thanks to spatializers that introduce artificial reverberation effects. The drawback of such methods is that obtaining a realistic rendering requires a significant processing power.        
As far as “binaural” sound spatialization is concerned, a common method consists of modeling the binaural filters, by decomposing the HRTFs, or HRIRs, into a minimum-phase component (minimum-phase filter determined by the spectral modulus of the HRTF) and a pure delay. For a more detailed description of such a method, reference may usefully be made to the articles by D. J. Kistler and F. L. Wightman, “A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction”, J. Acoustic Soc. Am., 91(3) pp. 1637-1647, 1992 and by Kulkarni A. et al. “On the minimum-phase approximation of head-related functions”, 1995 IEEE ASSP Workshop on Applications of Signal Processing Audio and Acoustics (IEEE catalog number: 95TH8144).
The difference in delay observed between the HRTFs or the HRIRs of the left ear and of the right ear then correspond to the ITD localization index. Various methods exist for extracting the delays from the HRIRs or HRTFs. The main methods are described by S. Busson in “Individualization of acoustic indices for binaural synthesis”, Doctoral thesis from the Université de la Méditerranée Aix-Marseille II, 2006.
The spectral modulus is obtained by taking the modulus of the Fourier transform of the HRIRs. The number of coefficients can then be reduced, for example by averaging the energy over a reduced number of frequency bands, for example according to the frequency smoothing techniques based on the integration properties of the auditory system.
Irrespective of the manner in which the HRTF, HRIR or, where appropriate, BRIR filters are modeled, several methods for implementation of binaural sound spatialization exist.
Amongst the latter, the simplest and most direct method is the dual-channel implementation of the binaural technique shown in FIG. 1.
According to this method, the spatialization of the sources is carried out independently from each other. One pair of HRTF filters is associated with each source. The filtering can be carried out either in the time domain, in the form of a convolution product, or in the frequency domain, in the form of a complex multiplication, or alternatively in any other transformed domain, such as for example the PQMF (Pseudo-Quadrature mirror Filter) domain.
Multi-channel implementation of the binaural technique is an alternative to dual-channel implementation offering a more efficient implementation that consists of a linear decomposition of the HRTFs, in the form of a sum of products of functions of the direction (encoding gains) and of elementary filters (decoding filters). This decomposition allows the encoding and decoding steps to be separated, the number of filters then being independent from the number of sources to be spatialized. The elementary filters may subsequently be modeled by a minimum-phase filter and a pure delay in order to simplify their implementation. It is also possible to extract the delays from the original HRTFs and to integrate them separately in the encoding.
The aforementioned prior art techniques exhibit major drawbacks, when BRIR filters are implemented, taking into account the room effect, in particular:                the complexity: owing to the long duration of the room responses, the number of time samples contained in the BRIRs can be very high, greater than 20,000 samples for rooms of average size, this number being linked to the delay of the room echoes and therefore the dimensions of the latter. Consequently, the corresponding BRIR filters require a processing power and a memory size that are very large;        externalization: the modeling in the form of a minimum-phase filter, associated with a pure delay, allows the size of the filters to be reduced. However, extracting a single interaural delay for each BRIR filter does not allow the first reflections to be taken into account. In this case, the sound timber is correctly adhered to but the externalization effect is no longer reproduced.        
The object of the present invention is to overcome the aforementioned drawbacks of the prior art.