Higher Order Ambisonics (HOA) offers one possibility to represent three-dimensional sound among other techniques like wave field synthesis (WFS) or channel based approaches like 22.2. In contrast to channel based methods, however, the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. This flexibility, however, is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up. Compared to the WFS approach, where the number of required loudspeakers is usually very large, HOA may also be rendered to set-ups consisting of only few loudspeakers. A further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to head-phones.
HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion. Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function. Hence, without loss of generality, the complete HOA sound field representation actually can be assumed to consist of O time domain functions, where O denotes the number of expansion coefficients. These time domain functions will be equivalently referred to as HOA coefficient sequences or as HOA channels.
The spatial resolution of the HOA representation improves with a growing maximum order N of the expansion. Unfortunately, the number of expansion coefficients O grows quadratically with the order N, in particular O=(V+1)2. For example, typical HOA representations using order N=4 require O=25 HOA (expansion) coefficients. According to the previously made considerations, the total bit rate for the transmission of HOA representation, given a desired single-channel sampling rate fs and the number of bits Nb per sample, is determined by O·fs·Nb. Consequently, transmitting an HOA representation of order N=4 with a sampling rate of fs=48 kHz employing Nb=16 bits per sample results in a bit rate of 19.2 MBits/s, which is very high for many practical applications, e.g. for streaming.
Compression of HOA sound field representations is proposed in patent applications EP 12306569.0 and EP 12305537.8. Instead of perceptually coding each one of the HOA coefficient sequences individually, as it is performed e.g. in E. Hellerud, I. Burnett, A. Solvang and U. P. Svensson, “Encoding Higher Order Ambisonics with AAC”, 124th AES Convention, Amsterdam, 2008, it is attempted to reduce the number of signals to be perceptually coded, in particular by performing a sound field analysis and decomposing the given HOA representation into a directional and a residual ambient component. The directional component is in general supposed to be represented by a small number of dominant directional signals which can be regarded as general plane wave functions. The order of the residual ambient HOA component is reduced because it is assumed that, after the extraction of the dominant directional signals, the lower-order HOA coefficients are carrying the most relevant information.