Compression of Higher-Order Ambisonics (HOA) content has not been deeply explored in the scientific literature. Therefore, this section will introduce an exemplary state-of-the-art monolithic architecture for self-contained compression of HOA content. It has been verified by extensive testing that this architecture enables high-quality coding of high-resolution spatial sound scenes at medium-level (e.g. 256 kbit/s) to high-level (e.g. 1.5 Mbit/s) data rates. The background information provided in this section is necessary for understanding the hierarchical concepts build upon this architecture.
FIG. 1 illustrates the concept for self-contained HOA compression from an encoder perspective. Note that the numbers and parameters provided in the figure are exemplary. For instance, the codec architecture is shown here for encoding of 4th order HOA content (N=4), which requires (N+1)2=25 equivalent audio channels for a full 3D representation. The same concept can be used for encoding of any HOA order from N=1 upwards. Likewise, the number 8 of extracted “audio channels” after dimensionality reduction is an exemplary number that shall highlight the order of magnitude—however, this number of 8 (on average) has been found suitable when encoding HOA content of order N=4.
The encoding process is divided into two stages which are to some extent independent from each other. The first stage 10 is a dimensionality reduction stage. It analyzes the input HOA content and reduces the signal dimension by decomposing it into a lower number of dominant sound components. The somewhat abstract term “sound components” is used because the resulting signals not necessarily correspond to sound objects, specific spatial directions or ambience—although they can in fact do so in special cases.
From information theory it is known that, at least for complex audio scenes, the information provided at the output of this stage 10 is systematically less than the input information. The dimensionality reduction stage 10 operates in such a manner that (1) the information loss is minimized, by exploiting inherent redundancy of the input audio scene as much as possible, and that (2) irrelevancy is reduced, i.e. the output signal still carries enough information such that the perceptual difference of a reconstructed audio scene compared to the input content is minimized. This stage 10 employs time-variant and signal-adaptive signal processing. The number of its output signals can be adaptive as well, depending on the parameterization as well as on signal characteristics.
The second encoding stage 11 comprises a bank of several (in this case 8) parallel perceptual encoders for monaural audio signals. These encoders encode the individual dominant sound components and operate using the principles of time-frequency coding that have been well-established since the 1990s. For instance, a bank of MPEG-4 Advanced Audio Coding (AAC) encoders could be utilized at the second encoding stage 11. The encoder implementations need to be slightly modified in order to enable the global coder control block to influence certain parameters of these core codecs such as average bit rate, window switching behavior, size of bit reservoir, behavior of spectral band replication, etc. This architecture has been chosen since it minimizes the design effort required for implementing a HOA codec by facilitating, to the maximum extent possible, the reuse of existing codec implementations and corresponding optimizations.
The operation of the full encoder is controlled by the coder control stage 12. Here, a perceptual audio scene analysis is performed which determines the parameters that are required in order to drive and control the other signal processing stages. In particular, this control instance is responsible for global optimization of data rate resources, and it is crucial for achieving a strong overall rate-distortion performance. Finally, resulting bit streams of the second encoding stage 11 and side information from the coder control stage 12 are multiplexed 13 into a single output bit stream.