The present invention relates to the field of video processing, and in particular to systems for preparing multimedia presentations based on video content.
Video data contains both visual and audio information that is time based, i.e. organised in time sequence. Consequently, even a small segment or clip of video contains a large amount of information. With the advent of digital technology, compression techniques have been applied to digital video to remove redundant information from digital video and thereby reduce the amount of storage capacity and bandwidth required to store and transmit digital video. Nonetheless, even with compression, the storage capacity and bandwidth requirements remain high.
Ordinarily, video is prepared for delivery over a channel, or medium, with high transmission capacity. However, with the wide proliferation of digital technology and the popularity of Internet dial-up connections with the general public, there are increasing demands for video information to be provided by lower capacity channels such as the Internet. Transmitting video information via such low capacity channels presents special challenges.
These challenges are further compounded by the proliferation of devices such as cellular phones, palm-top computers and TV set-top boxes, which can be used to display or present digital information. These devices have significantly differing audio, visual and text presentation capabilities. Not uncommonly, such devices handle audio at different bit rates and frequency ranges. Further, their visual display has different colour depths, and they may be capable of text display having limited lines and characters. Generally speaking, multimedia personal computers have high presentation capabilities, while cellular phones have low capabilities for displaying visual information, for example.
Thus, a need clearly exists for a system of processing full-motion video content for presentation on a variety of devices that have widely differing audio and visual/text display capabilities and different bandwidth requirements.
In accordance with a first aspect of the invention, there is disclosed a method of converting a video into multiple markup language presentations for different devices and users. The method includes the step of creating a video database containing shot and key frame information of the video. It also includes the step of generating at least one of audio, visual and textual content for presentation dependent upon display capabilities of the different devices and user specified criteria. The generating step includes the sub-steps of: if the presentation is to contain visual content, determining a heuristic measure for a desired image for display on the different devices, the heuristic measure being dependent upon either a Display Dependent Significance Measure (DDSM), or the DDSM and a user supplied significance measure; if the presentation is to contain visual content, ranking and selecting one or more images from the video to be displayed dependent upon the heuristic measure; if the presentation is to contain audio content, extracting an audio stream from the video; if the presentation is to contain textual content, selecting the textual content from video annotation and/or a transcript associated with the video. The method also includes the step of creating multiple static and/or dynamic markup language documents dependent upon the display capabilities of the different devices and the user specified criteria for different presentations on the different devices, each document containing at least a portion of the generated audio, visual and textual content catering for a presentation on a corresponding device.
Preferably, the method includes the sub-step of, if the presentation is to contain visual content, converting the one or more selected images to different sizes and colour depths dependent upon respective display requirements of the devices.
Preferably, the method includes the sub-step of, if the presentation is to contain audio content, converting the audio stream to have different sampling rates, number of channels, compression ratios and/or resolutions for different delivery audio channels.
Preferably, the method includes the sub-step of, if the presentation is to contain textual content, generating the timed-transcript.
The presentations may include synchronous and/or asynchronous combinations of audio, visual, and text content. The combinations may include synchronised audio with visuals, synchronised audio with text, synchronised audio with text and visuals, synchronised text and visuals, and static text with visuals.
Preferably, the Display Dependent Significance Measure (DDSM) is dependent upon the information content of the image.
In accordance with a second aspect of the invention, there is disclosed an apparatus for converting a video into multiple markup language presentations for different devices and users. The apparatus includes a device for creating a video database containing shot and key frame information of the video. It also includes a device for generating at least one of audio, visual and textual content for presentation dependent upon display capabilities of the different devices and user specified criteria. The generating device includes: a device for, if the presentation is to contain visual content, determining a heuristic measure for a desired image for display on the different devices, the heuristic measure being dependent upon either a Display Dependent Significance Measure (DDSM), or the DDSM and a user supplied significance measure; a device for, if the presentation is to contain visual content, ranking and selecting one or more images from the video to be displayed dependent upon the heuristic measure; a device for, if the presentation is to contain audio content, extracting an audio stream from the video; and a device for, if the presentation is to contain textual content, selecting the textual content from video annotation and/or a transcript associated with the video. The apparatus also includes a device for creating multiple static and/or dynamic markup language documents dependent upon the display capabilities of the different devices and the user specified criteria for different presentations on the different devices, each document containing at least a portion of the generated audio, visual and textual content catering for a presentation on a corresponding device.
In accordance with a third aspect of the invention, there is disclosed a computer program product having a computer readable medium having a computer program recorded therein for converting a video into multiple markup language presentations for different devices and users. The computer program product includes a module for creating a video database containing shot and key frame information of the video. It also includes a module for generating at least one of audio, visual and textual content for presentation dependent upon display capabilities of the different devices and user specified criteria. The generating module includes: a module for, if the presentation is to contain visual content, determining a heuristic measure for a desired image for display on the different devices, the heuristic measure being dependent upon either a Display Dependent Significance Measure (DDSM), or the DDSM and a user supplied significance measure; a module for, if the presentation is to contain visual content, ranking and selecting one or more images from the video to be displayed dependent upon the heuristic measure; a module for, if the presentation is to contain audio content, extracting an audio stream from the video; and a module for, if the presentation is to contain textual content, selecting the textual content from video annotation and/or a transcript associated with the video. The computer program product also includes a module for creating multiple static and/or dynamic markup language documents dependent upon the display capabilities of the different devices and the user specified criteria for different presentations on the different devices, each document containing at least a portion of the generated audio, visual and textual content catering for a presentation on a corresponding device.