Higher Order Ambisonics (HOA) offers one possibility to represent 3-dimensional sound among other techniques, like wave field synthesis (WFS), or channel based approaches, like 22.2. In contrast to channel based methods, the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. This flexibility, however, is at the expense of a rendering process which is required for the playback of the HOA representation on a particular loudspeaker set-up. Compared to the WFS approach, where the number of required loudspeakers is usually very large, HOA may also be rendered to set-ups consisting of only few loudspeakers. A further advantage of HOA is that the same signal representation that is rendered to loudspeakers can also be employed without any modification for binaural rendering to head-phones. HOA is based on the idea to equivalently represent the sound pressure in a sound source free listening area by a composition of contributions from general plane waves from all possible directions of incidence. Evaluating the contributions of all general plane waves to the sound pressure in the center of the listening area, i.e. the coordinate origin of the used system, provides a time and direction dependent function, which is then for each time instant expanded into a series of so-called Spherical Harmonics functions. The weights of the expansion, regarded as functions over time, are referred to as HOA coefficient sequences, which constitute the actual HOA representation. The HOA coefficient sequences are conventional time domain signals, with the specialty of having different value ranges among themselves. In general, the series of Spherical Harmonics functions comprises an infinite number of summands, whose knowledge theoretically allows a perfect reconstruction of the represented sound field. In practice, however, to arrive at a manageable finite amount of signals, the series is truncated, thus resulting in a representation of a certain order N. This determines the number O of summands for the expansion, as given by O=(N+1)2. The truncation affects the spatial resolution of the HOA representation, which obviously improves with a growing order N. Typical HOA representations using order N=4 consist of 0=25 HOA coefficient sequences.
According to these considerations, the total bit rate for the transmission of HOA representation, given a desired single-channel sampling rate fS and the number of bits Nb per sample, is determined by 0·fS·Nb. Consequently, transmitting an HOA representation of order N=4 with a sampling rate of fS=48 kHz and employing Nb=16 bits per sample results in a bit rate of 19.2 MBits/s, which is very high for many practical applications as e.g. streaming. Thus, compression of HOA representations is highly desirable.
Previously, the compression of HOA sound field representations was proposed in [2,3,4] and was recently adopted by the MPEG-H 3D audio standard [1, Ch.12 and Annex C.5]. The main idea of the used compression technique is to perform a sound field analysis and decompose the given HOA representation into a predominant sound component and a residual ambient component. The final compressed representation on the one hand comprises a number of quantized signals, resulting from the perceptual coding of the pre-dominant sound signals and relevant coefficient sequences of the ambient HOA component. On the other hand, it comprises additional side information related to the quantized signals, which is necessary for the reconstruction of the HOA representation from its compressed version.
One important criterion for the mentioned HOA compression technique of the MPEG-H 3D audio standard to be used within consumer electronics devices, be it in the form of software or hardware, is the efficiency of its implementation in terms of computational demand. In particular, for the playback of compressed HOA representations the efficiency of both, the HOA decompressor, which reconstructs the HOA representation from its compressed version, and the HOA renderer, which creates the loudspeaker signals from the reconstructed HOA representation, is of high relevance. To address that issue, the MPEG-H 3D audio standard contains an informative annex (see [1, Annex G]) about how to combine the HOA decompressor and the HOA renderer to reduce the computational demand for the case that the intermediately reconstructed HOA representation is not required. However, in the current version of the MPEG-H 3D audio standard the description is very difficult to comprehend and appears not fully correct. Further, it addresses only the case where certain HOA coding tools are disabled (i.e the spatial prediction for the predominant sound synthesis [1, Sec. 12.4.2.4.3] and the computation of the HOA representation of vector-based signals [1, Sec. 12.4.2.4.4] in case the vectors representing their spatial distribution have been coded in a special mode (i.e. CodedVVecLength=1).