Higher Order Ambisonics (HOA) offers one possibility to represent three-dimensional sound, among other techniques like wave field synthesis (WFS) or channel based approaches like the one known as “22.2”. In contrast to channel based methods, a HOA representation offers the advantage of being independent of a specific loudspeaker set-up. This flexibility comes at the expense of a decoding process that is required for the playback of the HOA representation on a particular loudspeaker set-up. Compared to the WFS approach, where the number of required loudspeakers is usually very large, HOA may also be rendered to set-ups consisting of only few loudspeakers. A further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to head-phones.
HOA is based on the representation of the so-called spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion. Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function. Hence, without loss of generality, the complete HOA sound field representation actually can be understood as consisting of O time domain functions, where O denotes the number of expansion coefficients. These time domain functions will be equivalently referred to as HOA coefficient sequences or as HOA channels in the following.
The spatial resolution of the HOA representation improves with a growing maximum order N of the expansion. Unfortunately, the number of expansion coefficients O grows quadratically with the order N, and in particular O=(N+1)2. For example, typical HOA representations using order N=4 require O=25 HOA (expansion) coefficients. According to the above considerations, a total bit rate for the transmission of a HOA representation, given a desired single-channel sampling rate fs and the number of bits Nb per sample, is determined by O·fs·Nb. Consequently, transmitting a HOA representation e.g. of order N=4 with a sampling rate of fs=48 kHz employing Nb=16 bits per sample results in a bit rate of 19.2 MBits/s, which is very high for many practical applications such as e.g. streaming. Thus, a compression of HOA representations is highly desirable.
Various approaches for compression of HOA sound field representations were proposed in [4, 5, 6]. These approaches have in common that they perform a sound field analysis and decompose the given HOA representation into a directional and a residual ambient component. The final compressed representation comprises, on the one hand, a number of quantized signals, resulting from the perceptual coding of so called directional and vector-based signals as well as relevant coefficient sequences of the ambient HOA component. On the other hand, it comprises additional side information related to the quantized signals, which is necessary for the reconstruction of the HOA representation from its compressed version.
A reasonable minimum number of quantized signals for the approaches [4, 5, 6] is eight. Hence, the data rate with one of these methods is typically not lower than 256 kbit/s, assuming a data rate of 32 kbit/s for each individual perceptual coder. For certain applications, like e.g. audio streaming to mobile devices, this total data rate might be too high. Thus, there is a demand for HOA compression methods addressing distinctly lower data rates, e.g. 128 kbit/s.