Traditionally, audio content of multi-channel format (for example, stereo, 5.1, 7.1, and the like) are created by mixing different audio signals in a studio, or generated by recording acoustic signals simultaneously in a real environment. More recently, object-based audio content has become more and more popular as it carries a number of audio objects and audio beds separately so that it can be rendered with much improved precision compared with traditional rendering methods. The audio objects refer to individual audio elements that may exist for a defined duration of time but also contain spatial information describing the position, velocity, and size (as examples) of each object in the form of metadata. The audio beds or beds refer to audio channels that are meant to be reproduced in predefined, fixed speaker locations.
For example, cinema sound tracks may include many different sound elements corresponding to images on the screen, dialogs, noises, and sound effects that emanate from different places on the screen and combine with background music and ambient effects to create the overall auditory experience. Accurate playback requires that sounds be reproduced in a way that corresponds as closely as possible to what is shown on screen with respect to sound source position, intensity, movement, and depth.
During transmission of audio signals, beds and objects can be sent separately and then used by a spatial reproduction system to recreate the artistic intent using a variable number of speakers in known physical locations. In some situations, there may be tens of or even hundreds of individual audio objects contained for audio content rendering. As a result, the advent of such object-based audio data has significantly increased the complexity of rendering audio data within playback systems.
The large number of audio signals present in object-based content poses new challenges for the coding and distribution of such content. In some distribution and transmission systems, a transmission capacity may be provided with large enough bandwidth available to transmit all audio beds and objects with little or no audio compression. In some cases, however, such as Blu-ray disc, broadcast (cable, satellite and terrestrial), mobile (3G and 4G) and over the top (OTT) distribution, the available bandwidth is not capable of transmitting all of the bed and object information created by an audio mixer. While audio coding methods (lossy or lossless) may be applied to the audio to reduce the required bandwidth, audio coding may not be sufficient to reduce the bandwidth required to transmit the audio, particularly over very limited networks such as mobile 3G and 4G networks.
Some existing methods utilize clustering of the audio objects so as to reduce the number of input objects and beds into a smaller set of output clusters. As such, the computational complexity and storage requirements are reduced. However, the accuracy may be compromised because the existing methods only allocate the objects in a relatively coarse manner.