Traditionally, audio content of multi-channel format (for example, stereo, 5.1, 7.1, and the like) is created by mixing different audio signals in a studio, or generated by recording acoustic signals simultaneously in a real environment. More recently, object-based audio content has become more and more popular as it carries a number of audio objects and audio beds separately so that it can be rendered with much improved precision compared with traditional rendering methods. As used herein, the term “audio object” or “object” refers to individual audio elements that may exist for a defined duration of time but also has associated metadata describing information related to the object, such as the spatial position, velocity, content type, object width, loudness, and the like. As used herein, the term “audio bed” or “bed” refers to audio channels that are meant to be reproduced in predefined and fixed speaker locations.
For example, cinema sound tracks may include many different sound elements corresponding to images on the screen, dialogs, noises, and sound effects that emanate from different places on the screen and combine with background music and ambient effects to create the overall auditory experience. Accurate playback requires the sounds to be reproduced in such a way that corresponds as closely as possible to what is shown on screen with respect to sound source position, intensity, movement, and depth. Object-based audio systems represent a significant improvement over traditional channel-based audio systems that send audio content in the form of speaker feeds to individual speakers in a listening environment and are thus relatively limited with respect to spatial playback of specific audio objects.
During transmission of object-based audio content, beds and objects can be sent separately and then used by a spatial reproduction system to recreate the artistic intent using a variable number of speakers in known physical locations. In some situations, there may be tens or even hundreds of individual audio objects contained in the audio content. The large number of audio signals in the object-based content poses new challenges for various aspects related to processing of such content such as transmission, distribution, coding, and storage of such content.
For example, in some distribution and transmission systems, a transmission capacity may be provided with large enough bandwidth available to transmit all audio beds and objects with little or no audio compression. However, in some cases such as distribution via Blu-ray disc, broadcast (cable, satellite and terrestrial), mobile (3G, 4G as well as 5G), or over-the-top (OTT, or the Internet), the available bandwidth is insufficient to transmit information concerning all of the beds and objects created by an audio mixer. While audio coding methods (lossy or lossless) may be applied to the audio to reduce the required bandwidth, transmission bandwidth is usually still a bottleneck, especially for those networks with very limited bandwidth resources such as 3G, 4G as well as 5G mobile systems. High computational, transmission, and/or storage capacities may also be required for other aspects of processing such as coding and storage.
Therefore, it is desired to reduce the number of audio signals in the object-based content (for example, audio objects) in order to reduce computational complexity, transmission bandwidth requirements, and/or storage requirements.