Spatial representations of sound combine techniques for capturing, synthesizing and reproducing a sound environment allowing a listener a much greater degree of immersion in a sound environment. They allow in particular a user to discern a number of sound sources that is greater than the number of speakers available to him or her, and to pinpoint these sound sources in 3D, even when the direction thereof is not the same as that of a speaker. There are numerous applications for spatial representations of sound, including allowing a user to pinpoint sound sources in three dimensions on the basis of a sound arising from a set of stereo headphones, or allowing users to pinpoint sound sources in three dimensions in a room, the sound being emitted by speakers, for example 5.1 speakers. Additionally, spatial representations of sound allow new sound effects to be produced. For example, they allow a sound scene to be rotated or the reflection of a sound source to be applied to simulate the reproduction of a given sound environment, for example a cinema hall or a concert hall.
Spatial representations are produced in two main steps: ambisonic encoding and ambisonic decoding. To benefit from a spatial representation of sound, real-time ambisonic decoding is always required. Producing or processing a sound in real time may additionally involve real-time ambisonic encoding thereof. Since ambisonic encoding is a complex task, real-time ambisonic encoding capabilities may be limited. For example, a given amount of computational power will only be capable of encoding a limited number of sound sources in real time.
Techniques for spatially representing sound are described in particular by J. Daniel, Représentations de champs acoustiques, application à la transmission et à la reproduction de scenes sonores dans un contexte multimédia (“Representations of acoustic fields, application to the transmission and to the reproduction of sound scenes in a multimedia context”), INIST-CNRS, Cote INIST: T 139957. Ambisonically encoding a sound field consists in decomposing the sound pressure field to a point, corresponding for example to the position of a user, in the form of spherical coordinates, expressed in the following form:
      p    ⁡          (                        r          →                ,        t            )        =            ∑              m        =        0            ∞        ⁢                  j        m            ⁢                        j          m                ⁡                  (          kr          )                    ⁢                        ∑                      n            =                          -              m                                            +            m                          ⁢                                            B              mn                        ⁡                          (              t              )                                ⁢                                    Y              mn                        ⁡                          (                              θ                ,                φ                            )                                          in which p({right arrow over (r)},t) represents the sound pressure, at a time t, in the direction {right arrow over (r)} with respect to the point at which the sound field is calculated. jm represents the spherical Bessel function of order m.
Ymn(θ,φ) represents the spherical harmonic of order mn in the directions (θ,φ) defined by the direction {right arrow over (r)}. The symbol Bmn(t) defines the ambisonic coefficients corresponding to the various spherical harmonics, at a time t.
The ambisonic coefficients therefore define, at each time, the entirety of the sound field surrounding a point. The processing of sound fields in the ambisonic domain exhibits particularly interesting properties. In particular, it is very straightforward to rotate the entire sound field. It is also possible to broadcast, over speakers, sound including directional information on the basis of a set of ambisonic coefficients. It is for example possible to broadcast sound over 5.1 speakers. It is also possible to render sound including directional information in a set of headphones having only a left speaker and a right speaker by using transfer functions known as HRTFs (head-related transfer functions). These functions make it possible to render a directional signal over two speakers by adding a delay and/or an attenuation to at least one channel of a stereo signal, this being interpreted by the brain as defining the direction of the sound source.
The decomposition, referred to as HOA (higher order ambisonics), consists in truncating this infinite sum to an order M, greater than or equal to 1:
      p    ⁡          (                        r          →                ,        t            )        =            ∑              m        =        0            M        ⁢                  j        m            ⁢                        j          m                ⁡                  (          kr          )                    ⁢                        ∑                      n            =                          -              m                                            +            m                          ⁢                                            B              mn                        ⁡                          (              t              )                                ⁢                                    Y              mn                        ⁡                          (                              θ                ,                φ                            )                                          
In general, a source that is sufficiently far away is considered to propagate a sound wave spherically. The value, at a time t, of an ambisonic coefficient Bmn(t) linked to this source may then be considered to depend both on the sound pressure S(t) of the source at this time t and on the spherical harmonic linked to the orientation (θs,φs) of this sound source. It is therefore possible to state, for a single sound source:Bmn(t)=S(t)Ymn(θ,φs)
In the case of a set of Ns distant sound sources, the ambisonic coefficients describing the sound scene are calculated as the sum of the ambisonic coefficients of each of the sources, each source i having an orientation (θsi,φsi):
            B      mn        ⁡          (      t      )        =            ∑              i        =        0                              N          s                -        1              ⁢                  ⁢                            S          i                ⁡                  (          t          )                    ⁢                        Y          mn                ⁡                  (                                    θ                              s                i                                      ,                          φ                              s                i                                              )                    
This calculation may also be represented in vector form:
      (                                                      B              00                        ⁡                          (              t              )                                                                                      B                              1                -                1                                      ⁡                          (              t              )                                                                                      B              10                        ⁡                          (              t              )                                                                                      B              11                        ⁡                          (              t              )                                                            ⋮                                                                B              MM                        ⁡                          (              t              )                                            )    =            ∑              i        =        0                              N          s                -        1              ⁢                            S          i                ⁡                  (          t          )                    ⁢              (                                                                              Y                  00                                ⁡                                  (                                                            θ                                              s                        i                                                              ,                                          φ                                              s                        i                                                                              )                                                                                                                          Y                                      1                    -                    1                                                  ⁡                                  (                                                            θ                                              s                        i                                                              ,                                          φ                                              s                        i                                                                              )                                                                                                                          Y                  10                                ⁡                                  (                                                            θ                                              s                        i                                                              ,                                          φ                                              s                        i                                                                              )                                                                                                                          Y                  11                                ⁡                                  (                                                            θ                                              s                        i                                                              ,                                          φ                                              s                        i                                                                              )                                                                                        ⋮                                                                                            Y                  MM                                ⁡                                  (                                                            θ                                              s                        i                                                              ,                                          φ                                              s                        i                                                                              )                                                                    )            The ambisonic coefficients retain the form Bmn, where, to the order M, m ranging from 0 to M, and n ranging from −m to m.
A device comprising ambisonic encoding of at least one source may therefore define a complete sound field by calculating the ambisonic coefficients to an order M. Depending on the order M, and on the number of sources, this calculation may be long and resource intensive. Specifically, to an order M, (M+1)2 ambisonic coefficients are calculated at each time t. For each coefficient, the contribution Bmn(t)=S(t)Ymn(θs,φs) of each of the Ns sources must be calculated. If a source S is fixed, the spherical harmonic Ymn(θs,φs) may be pre-calculated. Otherwise, it must be recalculated at each time.
Increasing the order of the ambisonic coefficient allows better quality auditory rendition. It may therefore be difficult to obtain good sound quality while keeping the computing time and load, the electrical consumption and the battery usage at reasonable levels. This is even more the case now that ambisonic coefficients are often calculated in real time on mobile devices. Consider for example the case of a smartphone for listening to music in real time, with directional information calculated using ambisonic coefficients.
This issue becomes more problematic when reflections are calculated in a sound scene.
Calculating reflections make it possible to simulate a sound scene in a room, for example a cinema or concert hall. Under these conditions, the sound is reflected off the walls of the hall, giving a characteristic “ambience”, the reflections being defined by the respective positions of the sound sources and of the listener, as well as by the materials over which the sound waves are diffused, for example the material of the walls. Creating hall-like sound effects using ambisonic audio coding is described in particular by J. Daniel, Représentations de champs acoustiques, application à la transmission et à la reproduction de scènes sonores dans un contexte multimédia (“Representations of acoustic fields, application to the transmission and to the reproduction of sound scenes in a multimedia context”), INIST-CNRS, Cote INIST: T 139957, pp. 283-287.
It is possible to simulate the effect of reflections and to give an “ambience” in ambisonics by adding, for each sound source, a set of secondary sound sources, the intensity and the direction of which are calculated on the basis of the reflections of the sound sources off the walls and obstacles of a sound scene. Several sound sources are required for each initial sound source to simulate a sound scene in a satisfactory manner. However, this makes the aforementioned problem of computational power and battery capacity even worse, since the complexity of calculating the ambisonic coefficients is further multiplied by the number of secondary sound sources. The complexity of calculating the ambisonic coefficients for a satisfactory sound rendition may then make this solution impracticable, for example because it becomes impossible to calculate the ambisonic coefficients in real time, because the computing load for calculating the ambisonic coefficients becomes too great, or because the electrical and/or battery consumption on a mobile device becomes prohibitive.
N. Tsingos et al. Perceptual Audio Rendering of Complex Virtual Environment, ACM Transactions on Graphics (TOG)—Proceedings of ACM SIGGRAPH 2004, Volume 23 Issue 3, August 200, pp. 249-258 discloses a binaural processing method for overcoming this problem. The solution proposed by Tsingos consists in decreasing the number of sound sources by:                evaluating the power of each sound source;        classing the sound sources, from the most to the least powerful;        removing the least powerful sound sources;        grouping the remaining sound sources together into clusters of sound sources that are close to one another, and merging them to obtain, for each cluster, a single virtual sound source.        
The method disclosed by Tsingos makes it possible to decrease the number of sound sources, and hence the complexity of overall processing when reverberations are used. However, this technique has several drawbacks. It does not improve the complexity of processing the reverberations themselves. The same problem would be encountered again if, with a smaller number of sources, it is desired to increase the number of reverberations. Additionally, the processing operations for determining the sound power of each source and for merging the sources into clusters have a substantial computing load themselves. The described experiments are limited to cases in which the sound sources are known in advance, and their respective powers have been pre-calculated. In the case of sound scenes for which multiple sources of various intensities are present, and the powers of which have to be recalculated, the associated computing load would, at least partially, cancel out the computing gain obtained by limiting the number of sources.
Lastly, the tests conducted by Tsingos provide satisfactory results when the sound sources are akin to noise, for example in the case of a crowd in the subway. For other types of sound sources, such a method could prove to be deleterious. For example, when recording a concert given by a symphony orchestra, it is often the case that several instruments, although exhibiting a low level of sound power, make an important contribution to the overall harmony. Simply removing the associated sound sources, just because they are relatively weak, would then have a severely negative effect on the quality of the recording.
There is therefore a need for a device and for a method for calculating ambisonic coefficients, which makes it possible to calculate, in real time, a set of ambisonic coefficients representing at least one sound source and one or more reflections thereof in a sound scene, while limiting the additional computational complexity linked to the one or more reflections of the sound source, without a priori decreasing the number of sound sources.