Techniques pertaining to the propagation of a sound wave in three-dimensional space, involving in particular specialized sound simulation and/or playback, implement audio signal processing methods applied to the simulation of acoustic and psycho-acoustic phenomena. Such processing methods provide for a spatial encoding of the acoustic field, its transmission and its spatialized reproduction on a set of loudspeakers or on headphones of a stereophonic headset.
Among the techniques of spatialized sound are distinguished two categories of processing that are mutually complementary but which are both generally implemented within one and the same system.
On the one hand, a first category of processing relates to methods for synthesizing a room effect, or more generally surrounding effects. From a description of one or more sound sources (signal emitted, position, orientation, directivity, or the like) and based on a room effect model (involving a room geometry, or else a desired acoustic perception), one calculates and describes a set of elementary acoustic phenomena (direct, reflected or diffracted waves), or else a macroscopic acoustic phenomenon (reverberated and diffuse field), making it possible to convey the spatial effect at the level of a listener situated at a chosen point of auditory perception, in three-dimensional space. One then calculates a set of signals typically associated with the reflections (“secondary” sources, active through re-emission of a main wave received, having a spatial position attribute) and/or associated with a late reverberation (decorrelated signals for a diffuse field).
On the other hand, a second category of methods relates to the positional or directional rendition of sound sources. These methods are applied to signals determined by a method of the first category described above (involving primary and secondary sources) as a function of the spatial description (position of the source) which is associated with them. In particular, such methods according to this second category make it possible to obtain signals to be disseminated on loudspeakers or headphones, so as ultimately to give a listener the auditory impression of sound sources stationed at predetermined respective positions around the listener. The methods according to this second category are dubbed “creators of three-dimensional sound images”, on account of the distribution in three-dimensional space of the awareness of the position of the sources by a listener. Methods according to the second category generally comprise a first step of spatial encoding of the elementary acoustic events which produces a representation of the sound field in three-dimensional space. In a second step, this representation is transmitted or stored for subsequent use. In a third step, of decoding, the decoded signals are delivered on loudspeakers or headphones of a playback device.
The present invention is encompassed rather within the second aforesaid category. It relates in particular to the spatial encoding of sound sources and a specification of the three-dimensional sound representation of these sources. It applies equally well to an encoding of “virtual” sound sources (applications where sound sources are simulated such as games, a spatialized conference, or the like), as to an “acoustic” encoding of a natural sound field, during sound capture by one or more three-dimensional arrays of microphones.
Among the conceivable techniques of sound spatialization, the “ambisonic” approach is preferred. Ambisonic encoding, which will be described in detail further on, consists in representing signals pertaining to one or more sound waves in a base of spherical harmonics (in spherical coordinates involving in particular an angle of elevation and an azimuthal angle, characterizing a direction of the sound or sounds). The components representing these signals and expressed in this base of spherical harmonics are also dependent, in respect of the waves emitted in the near field, on a distance between the sound source emitting this field and a point corresponding to the origin of the base of spherical harmonics. More particularly, this dependence on the distance is expressed as a function of the sound frequency, as will be seen further on.
This ambisonic approach offers a large number of possible functionalities, in particular in terms of simulation of virtual sources, and, in a general manner, exhibits the following advantages:                it conveys, in a rational manner, the reality of the acoustic phenomena and affords realistic, convincing and immersive spatial auditory rendition;        the representation of the acoustic phenomena is scalable: it offers a spatial resolution which may be adapted to various situations. Specifically, this representation may be transmitted and utilized as a function of throughput constraints during the transmission of the encoded signals and/or of limitations of the playback device;        the ambisonic representation is flexible and it is possible to simulate a rotation of the sound field, or else, on playback, to adapt the decoding of the ambisonic signals to any playback device, of diverse geometries.        
In the known ambisonic approach, the encoding of the virtual sources is essentially directional. The encoding functions amount to calculating gains which depend on the incidence of the sound wave expressed by the spherical harmonic functions which depend on the angle of elevation and the azimuthal angle in spherical coordinates. In particular, on decoding, it is assumed that the loudspeakers, on playback, are far removed. This results in a distortion (or a curving) of the shape of the reconstructed wavefronts. Specifically, as indicated hereinabove, the components of the sound signal in the base of spherical harmonics, for a near field, in fact depend also on the distance of the source and the sound frequency. More precisely, these components may be expressed mathematically in the form of a polynomial whose variable is inversely proportional to the aforesaid distance and to the sound frequency. Thus, the ambisonic components, in the sense of their theoretical expression, are divergent in the low frequencies and, in particular, tend to infinity when the sound frequency decreases to zero, when they represent a near field sound emitted by a source situated at a finite distance. This mathematical phenomenon is known, in the realm of ambisonic representation, already for order 1, by the term “bass boost”, in particular through:                M. A. GERZON, “General Metatheory of Auditory Localisation”, preprint 3306 of the 92nd AES Convention, 1992, page 52.        
This phenomenon becomes particularly critical for high spherical harmonic orders involving polynomials of high power.
The following document:
SONTACCHI and HÖLDRICH, “Further Investigations on 3D Sound Fields using Distance Coding” (Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-01), Limerick, Ireland, 6-8 Dec. 2001), discloses a technique for taking account of a curving of the wavefronts within a near representation of an ambisonic representation, the principle of which consists in:                applying an ambisonic encoding (of high order) to the signals arising from a (simulated) virtual sound capture, of WFS type (standing for “Wave Field Synthesis”);        and reconstructing the acoustic field over a zone according to its values over a zone boundary, thus based on the HUYGENS-FRESNEL principle.        
However, the technique presented in this document, although promising on account of the fact that it uses an ambisonic representation to a high order, poses a certain number of problems:                the computer resources required for the calculation of all the surfaces making it possible to apply the HUYGENS-FRESNEL principle, as well as the calculation times required, are excessive;        processing artifacts referred to as “spatial aliasing” appear on account of the distance between the microphones, unless a tightly spaced virtual microphone grid is chosen, thereby making the processing more cumbersome;        this technique is difficult to transpose over to a real case of sensors to be disposed in an array, in the presence of a real source, upon acquisition;        on playback, the three-dimensional sound representation is implicitly bound to a fixed radius of the playback device since the ambisonic decoding must be done, here, on an array of loudspeakers of the same dimensions as the initial array of microphones, this document proposing no means of adapting the encoding or the decoding to other sizes of playback devices.        
Above all, this document presents a horizontal array of sensors, thereby assuming that the acoustic phenomena in question, here, propagate only in horizontal directions, thereby excluding any other direction of propagation and thus not representing the physical reality of an ordinary acoustic field.
More generally, current techniques do not make it possible to satisfactorily process any type of sound source, in particular a near field source, but rather far removed sound sources (plane waves), this corresponding to a restrictive and artificial situation in numerous applications.