1. Field of the Invention
The present invention is related generally to video summarization techniques, and more particularly to methods and systems of creating video summaries through seamlessly integrating image, audio, and text features extracted from input video.
2. Description of the Related Art
Lengthy articles, treatises, or other text documents often include abstracts, which help readers ascertain quickly, without a detailed analysis of the entire document, whether the document's contents are of interest. As can be the case with a text document, the content and nature of a video program often cannot be captured at a glance. In similar fashion, it generally is desirable to provide an abstract or summary for long video programs in order to show overall content in a general fashion.
Recently, the explosive growth of the World-Wide Web (WWW or Web) has dramatically increased the number of on-line text and multimedia data collections. As this trend toward more on-line multimedia content continues, automatic data summarization techniques that assist users in quickly identifying the most relevant information from vast volumes of data are becoming more and more significant.
In this context, video summarization presents substantial challenges. The task is challenging because it requires, initially, summarization of both the image track and the audio track of a video program. Effective integration of the two summaries in a natural way presents an additional challenge.
In general, most kinds of video summarization can be classified into three categories: audio-centric summarization, image-centric summarization, and integrated audio-visual summarization. There are certain types of video programming, such as news broadcasts, documentaries, and video seminars, for example, which do not have a strong correlation between the associated audio and image tracks. For such video categories, it is appropriate to employ an integrated audio-visual summarization approach that maximizes coverage of both audio and image content, while providing a loose audio and image alignment. On the other hand, other types of video programs, such as movies, dramas, talk shows, and the like, may have a strong correlation between the audio and image tracks. For these types of video programs, synchronization between the audio presentation and the video images is critical; in these circumstances, it is appropriate to employ a summarization methodology that is either audio-centric or image-centric.
Conventional systems have failed to provide a comprehensive solution to the problem of effective and efficient summarization for these various types of video programming. Many video summarization systems and methods presently in use heuristically deem certain types of video content as important, and create summaries by extracting these pre-identified contents from the input video. Consequently, these video summarization systems and methods are very domain and application specific, and are not capable of creating summaries based on user' individual needs, or of handling wide varieties of video programs.