Traditionally, audio content is created and stored in channel-based formats. As used herein, the term “audio channel” or “channel” refers to the audio content that usually has a predefined physical location. For example, stereo, surround 5.1, 7.1 and the like are all channel-based formats for audio content. Recently, with developments in the multimedia industry, three-dimensional (3D) movies and television content are getting more and more popular in cinema and home. In order to create a more immersive sound field and to control discrete audio elements accurately, irrespective of specific playback speaker configurations, many conventional multichannel systems have been extended to support a new format that includes both channels and audio objects.
As used herein, the term “audio object” refers to an individual audio element that exists for a defined duration in time in the sound field. An audio object may be dynamic or static. For example, audio objects may be humans, animals or any other elements serving as sound sources. During transmission, audio objects and channels can be sent separately, and then used by a reproduction system on the fly to recreate the artistic intents adaptively based on the configuration of playback speakers. As an example, in a format known as “adaptive audio content,” there may be one or more audio objects and one or more “channel beds” which are channels to be reproduced in predefined, fixed locations.
In general, object-based audio content is generated in a quite different way from the traditional channel-based audio content. Due to constraints in terms of physical devices and/or technical conditions, however, not all audio content providers are capable of generating the adaptive audio content. Moreover, although the new object-based format allows creation of more immersive sound field with the aid of audio objects, the channel-based audio format still prevails in movie sound ecosystem, for example, in the chains of sound creation, distribution and consumption. As a result, given traditional channel-based content, in order to provide end users with similar immersive experiences as provided by the audio objects, there is a need to extract audio objects from traditional channel-based content. At present, however, no solution is known to be capable of accurately and efficiently extracting audio objects from conventional channel-based audio content.
In view of the foregoing, there is a need in the art for a solution for audio object extraction from channel-based audio content.