At a high level, sound can be described using different coordinate systems. For example, a Cartesian coordinate system (e.g., x, y, and z coordinates), or a spherical coordinate system (e.g., two angles, one angle with a vertical axis, and a rotational x-angle and a radius) may be used. There is also a cylindrical coordinate system, which can be thought of as something of a combination of Cartesian and spherical coordinate systems.
Conventional approaches to surround sound are typically based on spherical coordinates, and utilize a spherical microphone to record the sound. However, most users who listen to surround sound do not use a spherical loudspeaker system to reproduce the sound, and even if they did, such a system would require a large number of loudspeakers in order to work well without spatial aliasing. In addition, the spherical microphone that would be used would require a vast number of spherical harmonics to describe the sound field in both the elevational plane as well as the horizontal plane with sufficient resolution for the loudspeaker array (it should be noted that with a small speaker array, only a correspondingly small number of spherical harmonics signals can be used with it).
Stereophonic audio reproduction allows for sound to be created from any angle between two loudspeakers. However, stereophonic reproduction cannot produce sound arrivals from outside the angle subtended by the two loudspeakers.
Surround sound systems aim to provide users with a more immersive experience of sound by enabling the creation of sound waves arriving from all directions around a listener. Two-dimensional (2D) systems can generate sound waves in a horizontal plane arriving from angles over 360 degrees, and three-dimensional (3D) arrays can additionally generate sound waves arriving from elevations above the listener (and from below the listener in special purpose reproduction rooms such as anechoic chambers).
Surround sound reproduction systems typically consist of L loudspeakers in a 2D or 3D array. For example, a common format is to have L=5 loudspeakers in a circular array around the listener. The loudspeakers are positioned with a center loudspeaker in front of the listener, a left and right loudspeaker at +/−30 degrees on either side, and a pair of rear surround loudspeakers at +/−110 degrees.
Often, the sound signals for the loudspeakers are directly generated in a recording studio, where a large number of audio “tracks”—obtained, for example, from electronic sound devices or from recorded instruments—are available. Using existing principles of surround sound reproduction, it is possible to position sound at any angle, including angles between the loudspeakers, in a manner similar to the stereophonic case. This positioning (often referred to as “panning”) is done for a surround system with L loudspeakers, with a known geometry, and for a given audio signal track, by amplitude weighting the audio track with L different amplitude weightings and feeding the resulting L weighted audio signals to all L loudspeakers.
In some cases, “pairwise panning” may be implemented, where an audio signal is sent to only two loudspeakers in a similar manner to stereo, to create a source position between the two loudspeakers. A general approach to achieving this is vector-based amplitude panning. Other modifications to the audio signals may also be made, such as filtering, which is understood by those skilled in the art to improve the quality of the reproduced signal. The net result of this mixing operation is a set of L loudspeaker signals that are played by the loudspeakers, positioned in the required geometry, to produce the desired sound field.
In some instances, the audio tracks that are panned are obtained from live recordings using multiple microphones. For example, a microphone may be placed in close proximity to each instrument being played so as to capture the audio signals produced by that individual instrument.
In other scenarios, surround sound systems may be used to reproduce a live recording that has been recorded using a single microphone system that attempts to reproduce the spatial sound field around a single listener. In this case, the recording microphone must capture the spatial attributes of the sound field with sufficient detail to allow the surround sound reproduction system to recreate the sound field. A technique that is often used for the recreation of a sound field in this manner is Higher Order Ambisonics (HOA). HOA decomposes the sound field recorded using a microphone system into a set of signals that are obtained from the description of the sound field in (typically) spherical coordinates, and which allow reproduction of the sound field using an arbitrary geometry and number of loudspeakers. An equivalent method is Wave Field Synthesis, in which the sound pressure and normal component of the velocity on the surface of a volume of space allow the reproduction of the sound field within that volume of space.
An alternative approach to the physical-based methods described above is perception-based methods, in which only those spatial cues that are perceptually relevant are recorded. Such methods include, for example, Dirac, Binaural Cue Coding, and methods employed in MPEG surround encoding.
The microphone for recording the sound field may have multiple outputs, each of which represents a component of the spatial sound field. These components are often termed cylindrical or spherical modes of the sound field. One of the earliest existing surround sound microphones produced four audio outputs, representing the sound pressure and the three components of the sound velocity. These signals were obtained from a compact coincident array of four pressure capsules in a tetrahedral configuration. More recently, higher-order surround sound microphone systems have been constructed using circular or spherical arrays of pressure microphones, typically mounted on a solid or open spherical or cylindrical baffle.
Circular arrays of transducers have been used to determine direction of arrival. Open circular arrays of directional microphones, where there is no cylindrical baffle, have been specifically applied to sound field recording. Open arrays of directional microphones have also proven to be useful for sound field decomposition, mainly because they eliminate or reduce zeros in the response that occurs with open arrays of pressure microphones. Other approaches have proposed the use of multiple circular arrays spaced along the z-axis, open arrays where each element is itself a higher-order microphone capable of producing multiple directional outputs, and circular arrays mounted on spherical baffles.
Most existing microphone arrays use conventional microphone elements that are based on capacitive or inductive transduction principles. More recently, micro-electro-mechanical systems (MEMS) have been developed, which implement small transducers in silicon. These devices are typically low-cost and small in size and are typically used in mobile phones. Arrays of MEMs microphones have been applied to the design of arrays for localization of sound. In some cases, these devices have on-board analogue to digital convertors and produce a single bit (sigma delta or pulse density modulation) output. In some cases the output is a serial representation of a pulse code modulation (PCM) representation of the analogue signal. Such devices are well-suited to the construction of large arrays where the outputs can be directly interfaced to digital processors without the need for a large number of external analog-to-digital convertors.
Spherical Harmonics Decomposition of 3D Sound Fields
The standard format for HOA is based on the use of spherical harmonics. The sound pressure in spherical coordinates at positive harmonic radian frequency ω can be expressed as
                              p          ⁡                      (                          r              ,              θ              ,              ϕ              ,              ω                        )                          =                              ∑                          n              =              0                        ∞                    ⁢                                    ∑                              m                =                                  -                  n                                            n                        ⁢                                                            j                  n                                ⁡                                  (                  kr                  )                                            ⁢                                                A                  n                  m                                ⁡                                  (                  k                  )                                            ⁢                                                Y                  n                  m                                ⁡                                  (                                      θ                    ,                    ϕ                                    )                                                                                        (        1        )            where
                                          Y            n            m                    ⁡                      (                          θ              ,              ϕ                        )                          =                                                                              (                                                            2                      ⁢                      n                                        +                    1                                    )                                                  4                  ⁢                  π                                            ⁢                                                                    (                                          n                      -                                                                      m                                                                                      )                                    !                                                                      (                                          n                      +                                                                      m                                                                                      )                                    !                                                              ⁢                                    p              n                                              m                                                      ⁡                          (                              cos                ⁢                                                                  ⁢                θ                            )                                ⁢                      e                          i              ⁢                                                          ⁢              m              ⁢                                                          ⁢              ϕ                                                          (        2        )            is the (n, m)th normalized complex spherical harmonic.
An alternative description to equation (1) is based on the plane wave expansion of sound fields. A general solution to the wave equation is given by the Herglotz distribution
                              p          ⁡                      (                          r              ,              θ              ,              ϕ              ,              k                        )                          =                              1                          4              ⁢              π                                ⁢                                    ∫              0              π                        ⁢                                          ∫                0                                  2                  ⁢                  π                                            ⁢                                                Ψ                  ⁡                                      (                                                                  θ                        i                                            ,                                              ϕ                        i                                                              )                                                  ⁢                                  e                                      i                    ⁢                                                                  k                        ⇀                                            ·                                              r                        ⇀                                                                                            ⁢                sin                ⁢                                                                  ⁢                                  θ                  i                                ⁢                d                ⁢                                                                  ⁢                                  θ                  i                                ⁢                d                ⁢                                                                  ⁢                                  ϕ                  i                                                                                        (        3        )            The expansion of the plane wave term is
                              e                      i            ⁢                                          k                ⇀                            ·                              r                ⇀                                                    =                  4          ⁢          π          ⁢                                    ∑                              n                =                0                            ∞                        ⁢                                          ∑                                  m                  =                                      -                    n                                                  n                            ⁢                                                i                  n                                ⁢                                                      j                    n                                    ⁡                                      (                    kr                    )                                                  ⁢                                                      Y                    n                    m                                    ⁡                                      (                                          θ                      ,                      ϕ                                        )                                                  ⁢                                                                            Y                      n                      m                                        ⁡                                          (                                                                        θ                          i                                                ,                                                  ϕ                          i                                                                    )                                                        *                                                                                        (        4        )            Furthermore, the plane wave amplitude function can be expanded in terms of spherical harmonics as
                              Ψ          ⁡                      (                                          θ                i                            ,                              ϕ                i                                      )                          =                              ∑                          n              =              0                        ∞                    ⁢                                    ∑                              m                =                                  -                  n                                            n                        ⁢                                                            B                  n                  m                                ⁡                                  (                  k                  )                                            ⁢                                                Y                  n                  m                                ⁡                                  (                                                            θ                      i                                        ,                                          ϕ                      i                                                        )                                                                                        (        5        )            Substituting the expressions shown in equations (4) and (5) into equation (3) yields the plane wave expansion
                              p          ⁡                      (                          r              ,              θ              ,              ϕ                        )                          =                              ∑                          n              =              0                        ∞                    ⁢                                    ∑                              m                =                                  -                  n                                            n                        ⁢                                          i                n                            ⁢                                                B                  n                  m                                ⁡                                  (                  k                  )                                            ⁢                                                j                  n                                ⁡                                  (                  kr                  )                                            ⁢                                                Y                  n                  m                                ⁡                                  (                                      θ                    ,                    ϕ                                    )                                                                                        (        6        )            which is the same as equation (1), whereAnm(k)=inBnm(k)  (7)Thus, the plane wave coefficients as typically used in Ambisonics can be simply converted to the general coefficients in equation (1), and vice versa.
In Ambisonics, the sound field in equation (6) is described in terms of real spherical harmonics obtained from the real and imaginary parts of equation (2). Following the terminology in the 2D case, the complex spherical harmonics may be termed “phase modes” and the real spherical harmonics may be termed “amplitude modes.” It should be understood by those skilled in the art that the plane wave expansion (as shown in equation (6)) is equivalent to equation (1). Further, it should also be understood by those skilled in the art that other expansions in terms of real spherical harmonics may also be equivalent, and that the various conclusions presented in detail herein may apply equally to these other descriptions.
The coefficients in equation (1) can be determined using, for example, a solid spherical baffle or a continuous distribution of directional (e.g., outward facing) microphones (typically cardioid). The complex sound pressure on the surface of an open or rigid sphere, with the incident field from equation (1), has the generic form
                                          p            S                    ⁡                      (                          R              ,              θ              ,              ϕ              ,              k                        )                          =                              ∑                          n              =              0                        ∞                    ⁢                                    ∑                              m                =                                  -                  n                                            n                        ⁢                                                            b                  n                                ⁡                                  (                  ka                  )                                            ⁢                                                A                  n                  m                                ⁡                                  (                  k                  )                                            ⁢                                                Y                  n                  m                                ⁡                                  (                                      θ                    ,                    ϕ                                    )                                                                                        (        8        )            where, for example,
                                          b            n                    ⁡                      (            ka            )                          =                  {                                                                                                                -                      i                                                                                                                (                          ka                          )                                                2                                            ⁢                                                                        h                          n                          ′                                                ⁡                                                  (                          ka                          )                                                                                                      ,                                                                              rigid                  ⁢                                                                          ⁢                  sphere                                                                                                                                                                        j                        n                                            ⁡                                              (                        ka                        )                                                              -                                                                  ij                        n                        ′                                            ⁡                                              (                        ka                        )                                                                              ,                                                                              open                  ⁢                                                                          ⁢                  cardioid                  ⁢                                                                          ⁢                  sphere                                                                                        (        9        )            where hn(.) is the spherical Hankel function of the second kind (it should be noted that other arrays will produce other bn functions). The sound field coefficients are preferably obtained by multiplying pS(a,θ,ϕ,k) by a desired spherical harmonic function and integrating over the sphere
                                                        A              n              m                        ⁡                          (              k              )                                =                                    1                              4                ⁢                π                ⁢                                                                  ⁢                                                      b                    n                                    ⁡                                      (                    ka                    )                                                                        ⁢                                          ∫                0                π                            ⁢                                                ∫                  0                                      2                    ⁢                    π                                                  ⁢                                                                            p                      S                                        ⁢                                                                                  (                                          a                      ,                      θ                      ,                      ϕ                      ,                      k                                        )                                    ⁢                                                                                    Y                        n                        m                                            ⁡                                              (                                                  θ                          ,                          ϕ                                                )                                                              *                                    ⁢                  sin                  ⁢                                                                          ⁢                  θ                  ⁢                                                                          ⁢                  d                  ⁢                                                                          ⁢                  θ                  ⁢                                                                          ⁢                  d                  ⁢                                                                          ⁢                  ϕ                                                                    ⁢                                                      (        10        )            
The summation in equation (1) can be limited to a maximum order N≈┌kr┐ for a given maximum radius r and maximum wave number k, where ┌.┐ denotes rounding up to the nearest integer. In this case, there are a total of (N+1)2 terms in the expansion of the sound field. Each term corresponds to an audio signal anm(t) that represents the frequency-dependent expansion term Anm(k) in the time domain. Hence, there are a total of (N+1)2 audio signals required to represent the Nth-order approximation of the sound field.
Discrete Spherical Arrays
In practice, spherical arrays are implemented using a discrete array of M microphone elements. The design of spherical arrays involves the selection of a sphere of sufficient size to record the sound field, the selection of a number of microphone elements, M, and a sampling scheme to position these microphones on the surface of the sphere such that the spherical harmonics can be generated by a discrete approximation with sufficient accuracy. Typically, the number of microphones must be greater than (N+1)2 and must be placed regularly over the whole surface of the sphere to minimize the error in estimating the spherical harmonic coefficients up to order N.
A consequence of the discrete array is that the sound pressure on the sphere cannot be unambiguously determined for frequencies where the microphones are greater than half a wavelength apart. There is thus a maximum frequency where the array can operate correctly, known as the spatial aliasing frequency or—following the equivalent sampling theorem for 1D signals—the spatial Nyquist frequency. The aliasing frequency can be approximately determined for a sphere as follows. If the sphere has radius a then the surface area is 4πa2. The approximate spacing between microphones for a uniform geometry is then 2a√{square root over (π/M)} and the spatial Nyquist frequency is
                              f                      Nyq            ⁢                                                  ⁢            3                          ≈                              c                          4              ⁢              a                                ⁢                                    M              π                                                          (        11        )            
For example, an array of M=32 microphones on a sphere of radius 0.1 meters (m) produces a spatial Nyquist frequency of 2.7 kHz. A spatial Nyquist frequency of 8 kHz requires M=275 microphones. Hence, large numbers of microphones are required to produce high spatial Nyquist frequencies. This means that the construction of spherical arrays with sufficient size for recording 3D fields over audio frequency ranges is challenging.
Spherical Harmonics Decomposition of 2D Sound Fields
Most surround sound reproduction systems are 2D and produce a desired sound field in the horizontal plane. This is simpler and more practical than the installation of 3D arrays. Furthermore, it has been shown that human spatial acuity is greatest for sound sources in the horizontal plane. In some instances, reproduction arrays are used which provide capability for producing elevational cues. The recording and reproduction of 2D sound fields, or sound fields with greater accuracy in the horizontal plane, can therefore produce more efficient and perceptually relevant results.
Spherical harmonics can be used to record and reproduce 2D sound fields. It has been shown that the sectorial spherical harmonics, for which n=|m| and which have significant magnitudes only in the horizontal plane, are sufficient to allow reproduction of the sound pressure in the horizontal plane. The fraction of the total audio signals used for 2D sound recording and reproduction (e.g., using the 2.5D approach) is (2N+1)/(N+1)2, which is illustrated in the graphical representation 100 of FIG. 1.
It is clear from representation 100 that as the order of the recorded sound field increases, the sectorial components become a small fraction (percentage) of the total audio channels. For example, 10th order sound fields use 21 sectorial signals to represent the horizontal plane and a further 100 channels to include elevational information. When the sound reproduction system is a 2D array, this means that 83% of the audio signals are unnecessary.
Spherical harmonics can also be used to record and reproduce sound fields with greater resolution in the horizontal plane and limited resolution in elevation. These “mixed-order Ambisonics” approaches record a high order of sectorial harmonics but a restricted subset of the non-sectorial harmonics.
Mixed-order Ambisonics typically use rigid spherical microphone arrays and the determination of a transducer layout that allows a given set of mixed-order spherical harmonics to be determined with minimal error.
One existing approach uses a cylindrical microphone with the cylinder axis oriented along the horizontal x-axis. The cylinder has multiple circular arrays that allow a cylindrical decomposition of the sound field where the resolution in elevation is governed by the number of microphones in each ring and the resolution in azimuth is governed by the number of rings and the spacing between them. Since the number of rings can be set independently from the number of microphones per ring, the azimuth and elevation resolutions may be independently controlled.