Higher Order Ambisonics (HOA) offers one possibility to represent three-dimensional sound, among other techniques like wave field synthesis (WFS) or channel based approaches like 22.2. In contrast to channel based methods, however, the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. But this flexibility is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up. Compared to the WFS approach, where the number of required loudspeakers is usually very large, HOA may also be rendered to set-ups consisting of only few loud-speakers. A further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to head-phones.
HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion. Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function. Hence, without loss of generality, the complete HOA sound field representation actually can be assumed to consist of O time domain functions, where O denotes the number of expansion coefficients. These time domain functions will be equivalently referred to as HOA coefficient sequences or as HOA channels in the following.
The spatial resolution of the HOA representation improves with a growing maximum order N of the expansion. Unfortunately, the number of expansion coefficients O grows quadratically with the order N, in particular O=(N+1)2. For example, typical HOA representations using order N=4 require O=25 HOA (expansion) coefficients. According to the previously made considerations, the total bit rate for the transmission of HOA representation, given a desired single-channel sampling rate fS and the number of bits Nb per sample, is determined by O·fS·Nb. Consequently, transmitting an HOA representation of order N=4 with a sampling rate of fS=48 kHz employing Nb=16 bits per sample results in a bit rate of 19.2 MBits/s, which is very high for many practical applications like streaming for example. Thus, compression of HOA representations is highly desirable.
The compression of HOA sound field representations was proposed in EP 2665208 A1, EP 2743922 A1 and International application PCT/EP2013/059363, cf. ISO/IEC DIS 23008-3, MPEG-H 3D audio, July 2014. These approaches have in common that they perform a sound field analysis and decompose the given HOA representation into a directional and a residual ambient component. The final compressed representation is on one hand assumed to consist of a number of quantised signals, resulting from the perceptual coding of directional and vector-based signals as well as relevant coefficient sequences of the ambient HOA component. On the other hand it is assumed to comprise additional side information related to the quantised signals, which is necessary for the reconstruction of the HOA representation from its compressed version.
A reasonable minimum number of quantised signals is ‘8’ for the approaches in EP 2665208 A1, EP 2743922 A1 and International application PCT/EP2013/059363. Hence, the data rate with one of these methods is typically not lower than 256 kbit/s assuming a data rate of 32 kbit/s for each individual perceptual coder. For certain applications, like e.g. the audio streaming to mobile devices, this total data rate might be too high, which makes desirable HOA compression methods for significantly lower data rates, e.g. 128 kbit/s.
In European patent application EP 14306077.0 a method for the low bit-rate compression of HOA representations of sound fields is described that uses a smaller number of quantised signals, which are basically a small subset of the original HOA representation. For the replication of the missing HOA coefficients, prediction parameters are obtained for different frequency bands in order to predict additional directional HOA components from the quantised signals.