The ambisonic technique consists in using in each frequency band a sub-set of channels that have sought directivity characteristics. By way of example of application, mention can be made of:                Sound source separation:                    For entertainment (karaoke: voice suppression),            For music (mixing separated sources in a multichannel content),            For telecommunications (voice boosting, noise suppression),            For home automation (voice control),            Multichannel audio encoding.                        Decoding for multichannel diffusion:                    For the cinema,            For music,            For virtual reality.                        
Ambisonics consists in protecting an acoustic field over a base of spherical harmonic functions (base shown in FIG. 1), in order to obtain a spatialised representation of the sound stage. The function Ymnσ(θ, ϕ) is the spherical harmonic of order m and of index nσ, depending on spherical coordinates (θ, ϕ), defined with the following formula:
            Y      mn      σ        ⁡          (              θ        ,        ϕ            )        =                              P          ~                mn            ⁡              (                  cos          ⁢                                          ⁢          ϕ                )              ·          {                                                  cos              ⁢                                                          ⁢              n              ⁢                                                          ⁢              θ                                                                          if                ⁢                                                                  ⁢                σ                            =              1                                                                          sin              ⁢                                                          ⁢              n              ⁢                                                          ⁢              θ                                                                          if                ⁢                                                                  ⁢                σ                            =                                                                    -                    1                                    ⁢                                                                          ⁢                  and                  ⁢                                                                          ⁢                  n                                ≥                1                                                        where {tilde over (P)}mn(cos ϕ) is a polar function involving the Legendre polynomial:
                    P        ~            mn        ⁡          (      x      )        =                              ϵ          n                ⁢                                            (                              m                -                n                            )                        !                                              (                              m                +                n                            )                        !                                ⁢                  (                  -          1                )            n        ⁢                  (                  1          -                                    cos              2                        ⁢                                                  ⁢            x                          )                    n        2              ⁢                  d        n                    dx        n              ⁢                  P        m            ⁡              (        x        )              ⁢                  ⁢    with    ⁢                  ⁢                                     ⁢                        ϵ          0                =                              1            ⁢                                                  ⁢            and            ⁢                                                  ⁢                          ϵ              0                                =                                                    2                ⁢                                                                  ⁢                for                ⁢                                                                  ⁢                n                            ≥                              1                ⁢                                                                  ⁢                and                ⁢                                                                  ⁢                                                                  ⁢                                                      P                    m                                    ⁡                                      (                    x                    )                                                                        =                                          1                                                      2                    m                                    ·                                      m                    !                                                              ⁢                                                d                  n                                                  dx                  n                                            ⁢                                                (                                                            x                      2                                        -                    1                                    )                                m                                                        
As shown in FIG. 1, the first “vector” of the spherical harmonic base (at the top in FIG. 1) corresponds to the order m=0, the three “vectors” in the following line correspond to the order m=1 (oriented according to the three directions of space), etc.
In practice, an actual ambisonic encoding is carried out using a network of sensors, generally distributed over a sphere, which are combined in order to synthesise an ambisonic content of which the channels best respect the directivities of the spherical harmonics (as shown in FIG. 2). In reference to FIG. 2, a microphone MIC comprises a plurality of piezoelectric capsules C1, C2, . . . which receive sound waves according to various directions of arrival of space. A processing unit UT that receives the signals coming from these capsules carried out an ambisonic encoding using a matrix of filters presented hereinafter, and delivers ambisonic signals (formalised in a base of spherical harmonics of the type shown in FIG. 1).
The basic principles of ambisonic encoding are described hereinafter.
The ambisonic formalism, initially limited to the representation of spherical harmonic functions of order 1, was subsequently extended to the higher orders. The ambisonic formalism with a higher number of components is commonly referred to as “Higher Order Ambisonics” (or “HOA” hereinafter).
To each order m corresponds 2m+1 spherical harmonic functions, as shown in FIG. 1. Thus, a content of order M contains a total of (M+1)2 channels (4 channels with order 1, 9 channels with order 2, 16 channels with order 3, and so on).
The term “ambisonic components” hereinafter means the ambisonic signal in each ambisonic channel, in reference to the “vector components” in a vector base that would be formed by each spherical harmonic function. Thus for example, it is possible to count:                one ambisonic component for the order m=0,        three ambisonic components for the order m=1,        five ambisonic components for the order m=2,        seven ambisonic components for the order m=3, etc.        
The ambisonic signals captured for these various components are then distributed over a number N of channels which is deduced from the maximum order m that it is provided to capture in the sound stage. For example, if a sound stage is captured with an ambisonic microphone with 20 piezoelectric capsules, then the maximum captured ambisonic order is M=3, so that there is not more than 20 channels N=(M+1)2, the number of ambisonic components considered is 7+5+3+1=16 and the number N of channels is N=16, given moreover by the relationship N=(M+1)2, with M=3.
The ambisonic capture x(t) of order M and comprised of N sound sources si of incidence (θi, ϕi) propagating in a free field can then be written mathematically in the following matrix form:
      x    ⁡          (      t      )        =            As      ⁡              (        t        )              =                  [                                            1                                      …                                      1                                                          ⋮                                      ⋱                                      ⋮                                                                                            Y                  Mn                  σ                                ⁡                                  (                                                            θ                      1                                        ,                                          ϕ                      1                                                        )                                                                    …                                                                        Y                  Mn                  σ                                ⁡                                  (                                                            θ                      N                                        ,                                          ϕ                      N                                                        )                                                                    ]            ⁢              s        ⁡                  (          t          )                    
Where A is a matrix referred to as “mixing matrix”, of dimensions (M+1)2×N and of which each column Ai contains the mixing coefficients of the source i.
Physically, this matrix A corresponds to the encoding coefficients of each source i, associated with each direction of each source i. In order to extract the sources from such a content, a matrix B referred to as “separating matrix”, inverse of the matrix A, must be estimated. In order to obtain the matrix B, a step of blind source separation can be implemented, for example by using an independent component analysis (or “ICA” hereinafter) algorithm, or a main component analysis algorithm. The matrix B=A−1 allows for the extraction of the sources via the following operation:s(t)=Bx(t)
This step amounts to forming beams (or “beamforming” hereinafter), i.e. in combining various channels that have separate directivities, in order to create a new component that has the desired directivity. An example of beamforming in order to extract three components, for a HOA content of order 2, 4 or 6, is shown in FIG. 3. The higher the order is, the more directive the beamforming is and the higher the number of components that can be extracted is.
In practice, generating ambisonic signals x(t)=As(t) passes through an intermediate step of microphone capture such as shown in FIG. 2, where the sources s(t) are captured by the capsules of the microphone MIC in order to form the signals p1, p2, p3 . . . . The microphone encoding matrix E is then formalised such that x(t)=E·p(t), in order to obtain the ambisonic components x1, x2, . . . , xN (in N ambisonic channels as shown in FIG. 4). In reference now to FIG. 4, the inverse decoding matrix B of the matrix A, as presented hereinabove, is estimated, in order to determine the source signals s1, s2, s3:s(t)=Bx(t)
To decode an HOA content on a system of speakers, the approach is similar. Ambisonic signals in N channels x1, x2, . . . , xN are acquired, but, here, instead of considering s(t) as the sum of the contributions of sources, s(t) is considered as the sum of the signals emitted by a set of speakers (which then effectively makes it possible to supply these speakers with the signals s1, s2, s3 . . . ). The decoding matrix B is therefore formulated here using the positions of the speakers of a sound restitution system and the signals intended for the speakers according to the same method as the one used for the source separation are extracted.
In reality, the sensors used have physical limitations that cause a degradation in the microphone encoding, and therefore a degradation in the directivity of the ambisonic components. For example, the encoding of the high frequencies is degraded when the inter-sensor spacing becomes approximately greater than one half-wavelength: this is due to the phenomenon of spatial aliasing. At low frequencies, the microphone capsules tend to become omnidirectional and it becomes impossible to obtain the sought directivities. More precisely, the degradations at low frequencies are more marked when it entails synthesising ambisonic components of a high order. Generally, associated directivities are more complex and therefore more sensitive to variations in the properties of the sensors. FIG. 5 shows the degree of correlation between a theoretical encoding and an actual encoding using a spherical microphone with 32 capsules, according to the frequency and the ambisonic order. FIG. 5 shows that the highest degree of correlation is generally reached for frequencies between 1 kHz and 10 kHz. However, for the other frequency ranges (except for ambisonic orders 0 and 1), extracting sources would not always lead to the same result for a theoretical encoding and for an actual encoding of these same sources. More precisely, for frequencies outside of the interval [1 kHz-10 kHz], the components extracted are potentially degraded.
FIG. 6 shows the actual directivity in the horizontal plane of the first components of orders 0, 1, 2 and 3 according to the sound frequency. It appears, in FIG. 6, that the actual components are not suitably encoded. Indeed, if the example is considered of the component of order 0 at the frequency of 10 kHz, it is observed that it is not circular, contrary to the theoretical component and to the same component calculated at the frequencies between 300 and 1000 Hz. Thus, the directivity of this component at the frequency of 10 kHz is not respected, which could induce a degraded spatial resolution. Moreover, the components at order 1, 2 and 3 also have biased directivities for frequencies that are lower than 10 kHz.
More generally, when the theoretical directivity is not respected, the beamforming carried out no longer makes it possible to suitable extract the sought components. For example, this results in the appearance of interferences during source separation. This can also result in a degradation of the spatial resolution in frequency bands concerned by a multichannel diffusion. More particularly, a loss of energy in the low frequencies in the high orders during encoding is observed. This induces that the sources extracted thanks to channels of high orders can lose part of their energy in the frequencies concerned.
The utilisation of beamforming for source separation or for the restitution of an ideal ambisonic content or of a multichannel capture is already used in particular for the separating, or for multichannel decoding. For source separation, an inversion of the mixing matrix estimated via independent component analysis is used in order to extract the sources. For the multichannel decoding, the matrix of the ambisonic coefficients relating to the speakers can be inverted. On the other hand, the processing of an actual ambisonic content, affected by the physical limitations of the recording system, is not addressed in prior art. The only solution currently proposed is to limit the total bandwidth of the extracted sources, which is not satisfactory.