Some entertainment systems (e.g., televisions and surround sound systems), high fidelity speaker systems, headphones, and software applications may process object-based audio to utilize one or more spatialization technologies. For instance, entertainment systems may utilize a spatialization technology, such as Dolby Atmos, to generate a rich sound that enhances a user's experience of a multimedia presentation.
The spatial presentation of audio utilizes audio objects, which are audio signals with associated parametric source descriptions of position, such as three-dimensional coordinates, gain, such as volume level, and other parameters. Object-based audio is increasingly being used for many multimedia applications, such as digital movies, video games, simulators, streaming video and audio content, and three-dimensional video. The spatial presentation of audio may be particularly important in a home environment where the number of reproduction speakers and their placement is generally limited or constrained.
Some spatial audio formats utilize conventional channel-based speaker feeds to deliver audio to an endpoint device, such as a plurality of speakers or headphones. In addition, the spatial audio format may utilize a separate audio objects feed that is used by an encoder to create an immersive three-dimensional audio reproduction over the plurality of speakers or headphones. In one example, the encoder device combines at least one audio object, such as a positional trajectory object for a three-dimensional space, such as a room or other environment, with audio content to provide the immersive three-dimensional audio reproduction over the plurality of speakers or headphones.
The conventional technique for providing a separate audio objects feed that includes the audio objects for a plurality of channel-based speaker feeds creates inefficiencies at the encoder that combines the audio content and the audio objects for distribution to the plurality of speakers or headphones. For example, some digital cinema systems use up to 16 separate audio channels that are fed to individual speakers of a multimedia entertainment system. The separate audio objects feed is used to transport the plurality of audio objects that are associated with each of the separate audio channels. The encoder is to quickly and efficiently parse the separate audio objects feed to extract the plurality of audio objects. Then, the encoder is to combine the extracted plurality of audio objects with the separate audio channels for reproduction using a digital cinema system or reproduction over headphones. The audio associated with the separate audio channels may be carried in codec frames. Each of the codec frames may have a plurality of audio objects (e.g., 3-5 audio objects) carried in the separate audio objects feed (i.e., objects frame). Therefore, the encoder is to be computationally capable of quickly and efficiently extracting up to 80 audio objects from the separate audio objects feed and combining the extracted audio objects with the separate audio channels. The extraction and combining performed by the encoder generally occurs in a very short time duration (e.g., 32 ms).
The above described conventional technique for providing a separate audio objects feed that includes the audio objects for a plurality of channel-based speaker feeds necessitates the use of significant computational resources by the encoder. The use of significant computational resources by the encoder increases implementation costs associated with multimedia entertainment systems. Furthermore, the current conventional technique that provides the separate audio objects feed for the plurality of channel-based speaker feeds may not be viably scalable for use with channel-based speaker feeds implemented by future multimedia entertainment systems.
It is with respect to these and other considerations that the disclosure made herein is presented.