At present, audio content is generally created and stored in channel-based formats. For example, stereo, surround 5.1, and 7.1 are channel-based formats for audio content. With developments in the multimedia industry, three-dimensional (3D) movies, television content, and other digital multimedia content are getting more and more popular. The traditional channel-based audio formats, however, are often incapable of generating immersive and lifelike audio content to follow such progress. It is therefore desired to expand multi-channel audio systems to create more immersive sound field. One of important approaches to achieve this objective is the adaptive audio content.
Compared with the conventional channel-based formats, the adaptive audio content takes advantageous of both audio channels and audio objects. The term “audio objects” as used herein refer to various audio elements or sound sources existing for a defined duration in time. The audio objects may be dynamic or static. An audio object may be human, animals or any other object serving as the sound source in the sound field. Optionally, the audio objects may have associated metadata such as information describing the position, velocity, and size of an object. Use of the audio objects enables the adaptive audio content to have high immersive sense and good acoustic effect, while allowing an operator such as a sound mixer to control and adjust audio objects in a convenient manner. Moreover, by means of audio objects, discrete sound elements can be accurately controlled, irrespective of specific playback speaker configurations. In the meantime, the adaptive audio content may further include channel-based portions called “audio beds” and/or any other audio elements. As used herein, the term “audio beds” or “beds” refer to audio channels that are meant to be reproduced in pre-defined, fixed locations. The audio beds may be considered as static audio objects and may have associated metadata as well. In this way, the adaptive audio content may take advantages of the channel-based format to represent complex audio textures, for example.
Adaptive audio content is generated in a quite different way from the channel-based audio content. In order to obtain an adaptive audio content, a dedicated processing flow has to be employed from the very beginning to create and process audio signals. However, due to constraints in terms of physical devices and/or technical conditions, not all audio content providers are capable of generating such adaptive audio content. Many audio content providers can only produce and provide channel-based audio content. Furthermore, it is desirable to create the three-dimensional (3D) experience for the channel-based audio content which has already been created and published. However, there is no solution capable of generating the adaptive audio content by converting the great amount of channel-based conventional audio content.
In view of the foregoing, there is a need in the art for a solution for converting channel-based audio content into adaptive audio content.