Higher Order Ambisonics (HOA) is a multi-channel sound field representation [4], and HOA signals are multi-channel audio signals. The playback of certain multi-channel audio signal representations, particularly HOA representations, on a particular loudspeaker set-up requires a special rendering, which usually consists of a matrixing operation. After decoding, the Ambisonics signals are “matrixed”, i.e. mapped to new audio signals corresponding to actual spatial positions, e.g. of loudspeakers. Usually there is a high cross-correlation between the single channels.
A problem is that it is experienced that coding noise is increased after the matrixing operation. The reason appears to be unknown in the prior art. This effect also occurs when the HOA signals are transformed to the spatial domain, e.g. by a Discrete Spherical Harmonics Transform (DSHT), prior to compression with perceptual coders.
A usual method for the compression of Higher Order Ambisonics audio signal representations is to apply independent perceptual coders to the individual Ambisonics coefficient channels [7]. In particular, the perceptual coders only consider coding noise masking effects which occur within each individual single-channel signals. However, such effects are typically non-linear. If matrixing such single-channels into new signals, noise unmasking is likely to occur. This effect also occurs when the Higher Order Ambisonics signals are transformed to the spatial domain by the Discrete Spherical Harmonics Transform prior to compression with perceptual coders [8].
The transmission or storage of such multi-channel audio signal representations usually demands for appropriate multi-channel compression techniques. Usually, a channel independent perceptual decoding is performed before finally matrixing the I decoded signals {circumflex over ({circumflex over (x)})}i(l), i=1, . . . , I, into J new signals {circumflex over (ŷ)}j(l), j=1, . . . , J. The term matrixing means adding or mixing the decoded signals {circumflex over ({circumflex over (x)})}i(l) in a weighted manner. Arranging all signals {circumflex over ({circumflex over (x)})}i(l), i=1, . . . , I, as well as all new signals {circumflex over (ŷ)}j(l), j=1, . . . , J in vectors according to{circumflex over ({circumflex over (x)})}(l):=[{circumflex over ({circumflex over (x)})}1(l) . . . {circumflex over ({circumflex over (x)})}l(l)]T {circumflex over (ŷ)}(l):=[{circumflex over (ŷ)}1(l) . . . {circumflex over (ŷ)}J(l)]T the term “matrixing” origins from the fact that {circumflex over (ŷ)}(l) is, mathematically, obtained from {circumflex over ({circumflex over (x)})}(l) through a matrix operation{circumflex over (ŷ)}(l)=A{circumflex over ({circumflex over (x)})}(l)where A denotes a mixing matrix composed of mixing weights. The terms “mixing” and “matrixing” are used synonymously herein. Mixing/matrixing is used for the purpose of rendering audio signals for any particular loudspeaker setups.
The particular individual loudspeaker set-up on which the matrix depends, and thus the maxtrix that is used for matrixing during the rendering, is usually not known at the perceptual coding stage.