Higher Order Ambisonics (HOA) offers one possibility to represent three-dimensional sound among other techniques like wave field synthesis (WFS) or channel based approaches like the 22.2 multichannel audio format. In contrast to channel based methods, the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. This flexibility, however, is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up. Compared to the WFS approach, where the number of required loudspeakers is usually very large, HOA signals may also be rendered to set-ups consisting of only few loudspeakers. A further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to head-phones.
HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion. Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function. Hence, without loss of generality, the complete HOA sound field representation actually can be assumed to consist of O time domain functions, where O denotes the number of expansion coefficients. These time domain functions will be equivalently referred to as HOA coefficient sequences or as HOA channels in the following.
The spatial resolution of the HOA representation improves with a growing maximum order N of the expansion. Unfortunately, the number of expansion coefficients O grows quadratically with the order N, in particular O=(V+1)2. For example, typical HOA representations using order N=4 require O=25 HOA (expansion) coefficients. According to the previously made considerations, the total bit rate for the transmission of HOA representation, given a desired single-channel sampling rate fS and the number of bits Nb per sample, is determined by O·fS·Nb. Consequently, transmitting an HOA representation of order N=4 with a sampling rate of fS=48 kHz employing Nb=16 bits per sample results in a bit rate of 19.2 MBits/s, which is very high for many practical applications like e.g. streaming. Thus, compression of HOA representations is highly desirable.
The compression of HOA sound field representations is proposed in WO 2013/171083 A1, EP 13305558.2 and PCT/EP2013/075559. These processings have in common that they perform a sound field analysis and decompose the given HOA representation into a directional component and a residual ambient component. On one hand the final compressed representation is assumed to consist of a number of quantised signals, resulting from the perceptual coding of the directional signals and relevant coefficient sequences of the ambient HOA component. On the other hand it is assumed to comprise additional side information related to the quantised signals, which side information is necessary for the reconstruction of the HOA representation from its compressed version.
An important part of that side information is a description of a prediction of portions of the original HOA representation from the directional signals. Since for this prediction the original HOA representation is assumed to be equivalently represented by a number of spatially dispersed general plane waves impinging from spatially uniformly distributed directions, the prediction is referred to as spatial prediction in the following.
The coding of such side information related to spatial prediction is described in ISO/IEC JTC1/SC29/WG11, N14061, “Working Draft Text of MPEG-H 3D Audio HOA RMO”, November 2013, Geneva, Switzerland. However, this state-of-the-art coding of the side information is rather inefficient.