In its simplest form, audio data takes the form of a single channel of data representing sound characteristics such as frequency and volume; this is known as a mono signal. Stereo audio data, which comprises two channels of audio data and therefore includes, to a limited extent, directional characteristics of the sound it represents has been a highly successful audio data format. Recently, audio formats, including surround sound formats, which may include more than two channels of audio data and which include directional characteristics in two or three dimensions of the sound represented, are increasingly popular.
The term “spatial audio data” is used herein to refer to any data which includes information relating to directional characteristics of the sound it represents. Spatial audio data can be represented in a variety of different formats, each of which has a defined number of audio channels, and requires a different interpretation in order to reproduce the sound represented. Examples of such formats include stereo, 5.1 surround sound and formats such as Ambisonic B-Format and Higher Order Ambisonic (HOA) formats, which use a spherical harmonic representation of the soundfield. In first-order B-Format, sound field information is encoded into four channels, typically labelled W, X, Y and Z, with the W channel representing an omnidirectional signal level and the X, Y and Z channels representing directional components in three dimensions. HOA formats use more channels, which may, for example, result in a larger sweet area (i.e. the area in which the user hears the sound substantially as intended) and more accurate soundfield reproduction at higher frequencies. Ambisonic data can be created from a live recording using a Soundfield microphone, mixed in a studio using ambisonic panpots, or generated by gaming software, for example.
Ambisonic formats, and some other formats use a spherical harmonic representation of the sound field. Spherical harmonics are the angular portion of a set of orthonormal solutions of Laplace's equation.
The Spherical Harmonics can be defined in a number of ways. A real-value form of the spherical harmonics can be defined as follows:
                                          X                          l              ,              m                                ⁡                      (                          θ              ,              ϕ                        )                          =                                                                              (                                                            2                      ⁢                      l                                        +                    1                                    )                                ⁢                                                      (                                          l                      -                                                                      m                                                                                      )                                    !                                                            2                ⁢                                                      π                    ⁡                                          (                                              l                        +                                                                            m                                                                                              )                                                        !                                                              ⁢                                    P              l                                              m                                                      ⁡                          (                              cos                ⁢                                                                  ⁢                θ                            )                                ⁢                      {                                                                                sin                    ⁡                                          (                                                                                                  m                                                                          ⁢                        ϕ                                            )                                                                                                            m                    <                    0                                                                                                                    1                    /                                          2                                                                                                            m                    =                    0                                                                                                                    cos                    ⁡                                          (                                                                                                  m                                                                          ⁢                        ϕ                                            )                                                                                                            m                    >                    0                                                                                                          (        i        )            
Where 1≧0, −1≧m≧1, l and m are often known respectively as the “order” and “index” of the particular spherical harmonic, and the Pl|m| are the associated Legendre polynomials. Further, for convenience, we re-index the spherical harmonics as Yn(θ,φ) where n≧0 packs the value for l and m in a sequence that encodes lower orders first. We use:n=l(l+1)+m  (ii)
These Yn(θ,φ) can be used to represent any piece-wise continuous function ƒ(θ,φ) which is defined over the whole of a sphere, such that:
                              f          ⁡                      (                          θ              ,              ϕ                        )                          =                              ∑                          i              =              0                        ∞                    ⁢                                    a                              i                ,                                      ⁢                                          Y                i                            ⁡                              (                                  θ                  ,                  ϕ                                )                                                                        (        iii        )            
Because the spherical harmonics Yi(θ,φ) are orthonormal under integration over the sphere, it follows that the ai can be found from:
                              a          i                =                              ∫            0                          2              ⁢              π                                ⁢                                    ∫                              -                1                            1                        ⁢                                                            Y                  i                                ⁡                                  (                                      θ                    ,                    ϕ                                    )                                            ⁢                              f                ⁡                                  (                                      θ                    ,                    ϕ                                    )                                            ⁢                              ⅆ                                  (                                      cos                    ⁢                                                                                  ⁢                    θ                                    )                                            ⁢                              ⅆ                ϕ                                                                        (        iv        )            
which can be solved analytically or numerically.
A series such as that shown in equation iii) can be used to represent a soundfield around a central listening point at the origin in the time or frequency domains. Truncating the series of equation iii) at some limiting order L gives an approximation to the function ƒ(θ,φ) using a finite number of components. Such a truncated approximation is typically a smoothed form of the original function:
                              f          ⁡                      (                          θ              ,              ϕ                        )                          ≈                              ∑                          i              =              0                                                                        (                                      L                    +                    1                                    )                                2                            -              1                                ⁢                                    a              i                        ⁢                                          Y                i                            ⁡                              (                                  θ                  ,                  ϕ                                )                                                                        (        v        )            
The representation can be interpreted so that function ƒ(θ,φ) represents the directions from which plane waves are incident, so a plane wave source incident from a particular direction is encoded as:ai=4πYi(θ,φ)  (vi)
Further, the output of a number of sources can be summed to synthesise a more complex soundfield. It is also possible to represent curved wave fronts arriving at the central listening point, by decomposing a curved wavefront into plane waves.
Thus the truncated ai series of equation vi), representing any number of sound components, can be used to approximate the behaviour of the soundfield at a point in time or frequency. Typically a time series of such ai(t) are provided as an encoded spatial audio stream for playback and then a decoder algorithm is used to reconstruct sound according to physical or psychoacoustic principles for a new listener. Such spatial audio streams can be acquired by recording techniques and/or by sound synthesis. The four-channel Ambisonic B-Format representation can be shown to be a simple linear transformation of the L=1 truncated series v).
Alternatively, the time series can be transformed into the frequency domain, for instance by windowed Fast Fourier Transform techniques, providing the data in form ai(ω), where ω=2π f and f is frequency. The ai(ω) values are typically complex in this context.
Further, a mono audio stream m(t) can be encoded to a spatial audio stream as a plane wave incident from direction (θ,φ) using the equation:ai(t)=4πYi(θ,φ)m(t)  (vii)
which can be written as a time dependent vector a(t).
Before playback, the spatial audio data must be decoded to provide a speaker feed, that is, data for each individual speaker used to playback the sound data to reproduce the sound. This decoding may be performed prior to writing the decoded data on e.g. a DVD for supply to the consumer; in this case, it is assumed that the consumer will use a predetermined speaker arrangement including a predetermined number of speakers. In other cases the spatial audio data may be decoded “on the fly” during playback.
Methods of decoding spatial audio data such as ambisonic audio data typically involve calculating a speaker output, in either the time domain or the frequency domain, perhaps using time domain filters for separate high frequency and low frequency decoding, for each of the speakers in a given speaker arrangement that reproduce the soundfield represented by the spatial audio data. At any given time all speakers are typically active in reproducing the soundfield, irrespective of the direction of the source or sources of the soundfield. This requires accurate set-up of the speaker arrangement and has been observed to lack stability with respect to speaker position, particularly at higher frequencies.
It is known to apply transforms to spatial audio data, which alter spatial characteristics of the soundfield represented. For example, it is possible to rotate or mirror an entire sound field in the ambisonic format by applying a matrix transformation to a vector representation of the ambisonic channels.
It is an object of the present invention to provide methods of and systems for manipulating and/or decoding audio data, to enhance the listening experience for the listener. It is a further object of the present invention to provide methods and systems for manipulating and decoding spatial audio data which do not place an undue burden on the audio system being used.