This application is a national stage entry of International Application No. PCT/FR2007/050894, filed on Mar. 8, 2007, and claims priority to French Application No. 06 02685, filed Mar. 28, 2006, both of which are hereby incorporated by reference as if fully set forth herein in their entireties.
The invention relates to spatialization, known as 3D-rendered sound, of compressed audio signals.
Such an operation is for example carried out during the decompression of a compressed 3D audio signal for example, represented over a certain number of channels, into a different number of channels, two for example, in order to allow the reproduction of the 3D audio effects on a pair of headphones.
Thus, the term “binaural” is aimed at the reproduction on a pair of stereophonic headphones of an audio signal but still with spatialization effects. The invention is not however limited to the aforementioned technique and is notably applicable to techniques derived from the “binaural” technique, such as the reproduction techniques known as TRANSAURAL®, in other words on remote loudspeakers. TRANSAURAL® is a commercial trademark of the company COOPER BAUCK CORPORATION. Such techniques can then use a “cross-talk cancellation” technique, which consists in eliminating crossed acoustic channels, in such a manner that a sound, thus processed then emitted by the loudspeakers, may only be heard by one of the two ears of a listener.
Consequently, the invention also relates to the transmission and to the reproduction of multichannel audio signals and to their conversion to a reproduction device, transducer, imposed by the equipment of a user. This is for example the case for the reproduction of a 5.1 sound scene by a pair of audio headphones, or by a pair of loudspeakers.
The invention also relates to the reproduction, within the framework of a game or video recording for example, of one or more sound samples stored in files, with a view to their spatialization.
Various approaches have been proposed amongst the techniques known in the field of binaural sound spatialization.
In particular, dual-channel binaural synthesis consists, with reference to FIG. 1a, in filtering the signal from the various sound sources Si that it is desired to position, upon reproduction, at a position in space, by means of left HRTF-l and right HRTF-r acoustic transfer functions in the frequency domain corresponding to the appropriate direction, defined in polar coordinates (θ1, φ1). The aforementioned transfer functions HRTF, abbreviation for “Head-Related Transfer Functions”, are the acoustic transfer functions of the head of the listener between the positions in space and the auditory canal. In addition, their temporal figure is denoted “HRIR”, abbreviation for “Head-Related Impulse Response”. These functions may also comprise a room effect.
For each sound source Si, two signals, left and right, are obtained which are then added to the left and right signals coming from the spatialization of the other sound sources, in order to finally yield the signals L and R transmitted to the left and right ears of the listener.
The number of filters, or transfer functions, required is then 2.N for static binaural synthesis and 4.N for dynamic binaural synthesis, where N denotes the number of sound sources or audio streams to be spatialized.
Studies, entitled “A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction” conducted by D. Kistler and F. L. Wightman, published in J. Acoust. Soc. Am. 91(3): pp. 1637-1647 (1992) and by A. Kulkami 1995 “IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics” IEEE catalog number: 95TH8144, have enabled it to verify that the phases of the HRTF can be decomposed into the sum of two terms, one corresponding to the interaural delay and the other equal to the minimum phase associated with the modulus of the HRTF.
Thus, for an HRTF transfer function expressed in the form:H(ƒ)=|H(ƒ)|e−jφ(ƒ) φ(ƒ)=φdelay(ƒ)+φmin(ƒ)
φdelay(ƒ)=2πƒτ corresponds to the interaural delay;
φmin (ƒ)=H(log(|H(ƒ)|)) is the minimum phase associated with the modulus of the filter H.
The implementation of binaural filters is generally in the form of two minimum-phase filters and of a pure delay, corresponding to the difference of the left and right delays applied to the ear furthest away from the source. This delay is generally implemented by means of a delay line.
The minimum-phase filter is a finite pulse response filter and may be applied in the time or frequency domain. Infinite pulse-response filters may be sought in order to approximate the modulus of the minimum-phase HRTF filters.
As far as the binauralization is concerned, with reference to FIG. 1b, the situation is the non-limiting framework of a sound scene spatialized in 5.1 mode, with a view to the reproduction of the latter on the audio headphones of a human being HB.
Five loudspeakers C: Center, Lf: Left front, Rf: Right front, Sl: Surround left, Sr: Surround right, each produce a sound which is heard by the human being HB on the two receivers that are his ears. The transformations undergone by the sound are modeled by a filtering function representing the modification that this sound undergoes during its propagation between the loudspeaker which reproduces this sound and a given ear.
In particular, the sound emanating from the loudspeaker Lf affects the left ear LE via an HRTF filter A, but this same sound reaches the right ear RE modified by an HRTF filter B.
The position of the loudspeakers with respect to the aforementioned individual HB may be symmetrical or otherwise.
Each ear therefore receives the contribution from the 5 loudspeakers in the form modeled hereinafter:
Left ear LE: Bl=ALf+CC+BRf+DSl+ESr,
Right ear RE: Br=ARf+CC+BLf+DSr+ESl,
where Bl is the binauralized signal for the left ear LE and Br is the binauralized signal for the right ear RE.
The filters A, B, C, D and E are most commonly modeled by linear digital filters and, in the configuration shown in FIG. 1b, 10 filtering functions therefore need to be applied, which can be reduced to 5 in view of the symmetries.
In a manner known per se, the aforementioned filtering operations may be carried out in the frequency domain, for example by means of a fast convolution executed in the Fourier domain. An FFT, or Fast Fourier Transform, is then used in order to carry out the binauralization efficiently.
The HRTF filters A, B, C, D and E may be simplified in the form of a frequency equalizer and a delay. The HRTF filter A may be embodied in the form of a simple equalizer, since this is a direct path, whereas the HRTF filter B includes an additional delay. Conventionally, the HRTF filters may be decomposed into a minimum-phase filter and a pure delay. The delay for the ear closest to the source may be taken equal to zero.
The operation for reconstruction by spatial decoding of a 3D audio sound scene, using a reduced number of transmitted channels, such as is shown in FIG. 1c, is also known from the prior art. The configuration shown in FIG. 1c is that relating to the decoding of a coded audio channel having localization parameters in the frequency domain, in order to reconstruct a 5.1 spatialized sound scene.
The aforementioned reconstruction is carried out by a spatial decoder by frequency sub-bands, such as is shown in FIG. 1c. The coded audio signal m undergoes 5 spatialization processing steps, which are controlled by complex spatialization parameters or coefficients CLD and ICC calculated by the encoder and which allow, through decorrelation and gain correction operations, the sound scene composed of six channels, the five channels shown in FIG. 1b to which is added a low-frequency effect channel lfe, to be reconstructed in a realistic manner.
When it is desired to carry out a binauralization of the audio channels coming from a spatial decoder such as is shown in FIG. 1c, we are in fact limited, at the present time, to implementing a processing method according to the scheme shown in FIG. 1d. 
With reference to the aforementioned scheme, it seems necessary to carry out the transformation of the audio channels, which are available in the time domain, before carrying out the binauralization of the signal. This operation for returning to the time domain is symbolized by the synthesizer blocks “Synth” which perform the frequency-time transformation operation for each of the channels coming from the spatial decoder (SD). The filtering by the HRTF filters can then be carried out by the filters A, B, C, D, E, with or without application of the equalized scheme, corresponding to a conventional filtering.
One variant for binauralization of the audio channels from a spatial decoder can also consist, as is shown in FIG. 1e, in converting each audio channel delivered by the audio decoder in the time domain by a synthesizer “Synth”, then in executing the spatial decoding and binauralization operation, or spatialization, in the Fourier frequency domain, after transformation by FFT.
In this scenario, each module OTT, corresponding to a matrix of decoding coefficients, must then be converted in the Fourier domain, at the expense of an approximation, since the operations are not carried out within the same domain. Moreover, the complexity is further increased, since the synthesizing operation “Synth” is followed by three FFT transformations.
Thus, in order to binauralize a sound scene coming from a spatial decoder, there exist few other possibilities but to carry out:                either 6 time-frequency transformations, if it is desired to carry out the binauralization outside of the spatial decoder;        or a synthesizing operation followed by 3 FFT Fourier transformations, if it is desired to carry out the operation in the FFT domain.        
One other solution could also be used if need be that consists in carrying out the HRTF filtering directly in the domain of the sub-bands, as is shown in FIG. 1f. 
However, in this scenario, the HRTF filtering operations are complex to apply, since the latter impose the use of sub-band filters whose minimum length is fixed and which must take into account the phenomenon of spectral aliasing of the sub-bands.
The saving achieved by the reduction in transformation operations is negatively counterbalanced by the dramatic increase in the number of operations required for the filtering, owing to the execution of these operations in the PQMF, or Pseudo-Quadrature Mirror Filter, domain.
The objective of the present invention is to overcome the numerous drawbacks of the aforementioned prior art techniques for sound spatialization of 3D audio scenes, and notably for transauralization or binauralization of 3D audio scenes.
In particular, one objective of the present invention is the execution of a specific filtering of spatially coded audio signals or channels in the domain of the frequency sub-bands of a spatial decoding, in order to limit the number of transformation pairs, while at the same time reducing the filtering operations to the minimum, but conserving a good quality of source spatialization, notably in transauralization or binauralization.
According to one particularly noteworthy aspect of the present invention, the execution of the aforementioned specific filtering relies on rendering the spatialization, transaural or binaural filters in the form of an equalizer-delay, for direct application of a filtering by equalization-delay in the domain of the sub-bands.
Another objective of the present invention is the achievement of a 3D rendering quality very close to that obtained using modeling filters such as original HRTF filters, by the simple addition of a transaural spatial processing of very low complexity, following a conventional spatial decoding in the transformed domain.
A final objective of the present invention is a novel source spatialization technique applicable not only to the transaural or binaural rendering of a monophonic sound, but also to several monophonic sounds and notably to the multiple channels of stereo sounds in modes 5.1, 6.1, 7.1, 8.1 or higher.