Traditionally, audio content is created and stored in channel-based formats. As used herein, the term “audio channel” or “channel” refers to the audio content that usually has a predefined physical location. For example, stereo, surround 5.1, surround 7.1 and the like are all channel-based formats for audio content. Recently, with the development in the multimedia industry, three-dimensional (3D) audio content is getting more and more popular in cinema and home. In order to create a more immersive sound field and to control discrete audio elements accurately, irrespective of specific playback speaker configurations, many conventional playback systems need to be extended to support a new format of audio that includes both the audio channels and audio objects.
As used herein, the term “audio object” refers to an individual audio element that exists for a defined duration of time in the sound field. An audio object may be dynamic or static. For example, an audio object may be human, animal or any other object serving as a sound source in the sound field. Optionally, the audio objects may have associated metadata, such as the information describing the position, velocity, and the size of an object. Use of the audio objects enables the audio content to have a highly immersive listening experience, while allowing an operator, such as an audio mixer, to control and adjust the audio objects in a convenient manner. During transmission, the audio objects and channels can be sent separately, and then used by a reproduction system on the fly to recreate the artistic intention adaptively based on the configuration of playback speakers. As an example, in a format known as “adaptive audio content,” or “upmixed audio signal,” there may be one or more audio objects and one or more “audio beds”. As used herein, the term “audio beds” or “beds” refers to audio channels that are meant to be reproduced in pre-defined, fixed locations.
In general, object-based audio content is generated in a quite different way from the traditional channel-based audio content. Although the new object-based format allows the creation of a more immersive listening experience with the aid of audio objects, the channel-based audio format, especially the final-mixing audio format, still prevails in movie sound ecosystem, for example, in the chains of sound creation, distribution and consumption. As a result, given a traditional channel-based content, in order to provide the end users with similar immersive experiences as provided by the audio objects, there is a need to extract the audio objects from the traditional channel-based content.