1. Technical Field
The present invention relates to sound imaging systems, and more specifically relates to a system and method for creating a multi-channel sound image using video image information.
2. Related Art
As new multimedia technologies such as streaming video, interactive web content, surround sound and high definition television enter and dominate the marketplace, efficient mechanisms for delivering high quality multimedia content have become more and more important. In particular, the ability to deliver rich audio/visual information, often over a limited bandwidth channel, remains an ongoing challenge.
One of the problems associated with existing audio/visual applications involves the limited audio data made available. Specifically, audio data is often generated or delivered via only one (i.e., mono), or at most two (i.e., stereo) audio channels. However, in order to create a realistic experience, multiple audio channels are preferred. One way to achieve additional audio channels is to split up the existing channel or channels. Existing methods of splitting audio content include mono-to-stereo conversion systems, and systems that re-mix the available audio channels to create new channels. U.S. Pat. No. 6,005,946, entitled xe2x80x9cMethod and Apparatus For Generating A Multi-Channel Signal From A Mono Signal,xe2x80x9d issued on Dec. 21, 1999, which is hereby incorporated by reference, teaches such a system.
Unfortunately, such systems often fail to provide an accurate sound image that matches the accompanying video image. Ideally, a sound image should provide a virtual sound stage in which each audio source sounds like it is coming from its actual location in the three dimensional space being shown in the accompanying video image. In the above-mentioned prior art systems, if the original sound recording did not account for the spatial relation of the sound sources, a correct sound image is impossible to re-create. Accordingly, a need exists for a system that can create a robust multi-channel sound image from a limited (e.g., mono or stereo) audio source.
The present invention addresses the above-mentioned needs, as well as others, by providing an audio-visual information system that can generate a three-dimensional (3-D) sound image from a mono audio signal by analyzing the accompanying visual information. In a first aspect, the invention provides a sound imaging system for generating multi-channel audio data from an audio/video signal having an audio component and a video component, the system comprising: a system for associating sound sources within the audio component to video objects within the video component of the audio/video signal; a system for determining position information of each sound source based on a position of the associated video object in the video component; and a system for assigning sound sources to audio channels based on the position information of each sound source.
In a second aspect, the invention provides a program product stored on a recordable medium, which when executed generates multi-channel audio data from an audio/video signal having an audio component and a video component, the program product comprising: program code configured to associate sound sources within the audio component to video objects within the video component of the audio/video signal; program code configured to determine position information of each sound source based on a position of the associated video object in the video component; and program code configured to assign sound sources to audio channels based on the position information of each sound source.
In a third aspect, the invention provides a decoder having a sound imaging system for generating multi-channel audio data from an audio/video signal having an audio component and a video component, the decoder comprising: a system for extracting sound sources from the audio component; a system for extracting video objects from the video component; a system for matching sound sources to video objects; a system for determining position information of each sound source based on a position of the matched video object in the video component; and a system for assigning sound sources to audio channels based on the position information of each sound source.
In a fourth aspect, the invention provides a method of generating multi-channel audio data from an audio/video signal having an audio component and a video component, the method comprising the steps of: associating sound sources within the audio component to video objects within the video component of the audio/video signal; determining position information of each sound source based on a position of the associated video object in the video component; and assigning sound sources to audio channels based on the position information of each sound source.