Traditionally, audio content of multi-channel format (for example, stereo, 5.1, 7.1, and the like) is created by mixing different audio signals in a studio, or generated by recording acoustic signals simultaneously in a real environment. More recently, object-based audio content has become more and more popular as it carries a number of audio objects and audio beds separately so that it can be rendered with much improved precision compared with traditional rendering methods. As used herein, the term “audio object” refers to individual audio elements that may exist for a defined duration of time but also has associated metadata describing spatial information such as the position, velocity, and size of each object. As used herein, the term “audio bed” or “bed” refers to audio channels that are meant to be reproduced in predefined and fixed speaker locations.
For example, cinema sound tracks may include many different sound elements corresponding to images on the screen, dialogs, noises, and sound effects that emanate from different places on the screen and combine with background music and ambient effects to create the overall auditory experience. Accurate playback requires the sounds to be reproduced in such a way that corresponds as closely as possible to what is shown on screen with respect to sound source position, intensity, movement, and depth.
During transmission of audio signals, beds and objects can be sent separately and then used by a spatial reproduction system to recreate the artistic intent using a variable number of speakers in known physical locations. In some situations, there may be tens or even hundreds of individual audio objects contained in the audio content. Such object-based audio content has significantly increased the complexity of rendering audio data within playback systems.
The large number of audio signals in the object-based content poses new challenges for the coding and distribution of such content. In some distribution and transmission systems, a transmission capacity may be provided with large enough bandwidth available to transmit all audio beds and objects with little or no audio compression. However, in some cases such as distribution via Blu-ray disc, broadcast (cable, satellite and terrestrial), mobile (3G, 4G as well as 5G), or over-the-top (OTT, or the Internet), the available bandwidth is insufficient to transmit information concerning all of the beds and objects created by an audio mixer. While audio coding methods (lossy or lossless) may be applied to the audio to reduce the required bandwidth, transmission bandwidth is usually still a bottleneck, especially for those networks with very limited bandwidth resources such as 3G, 4G as well as 5G mobile systems.