1. Field of the Invention
Embodiments of the present invention relate to a high-speed video abstract generation method, medium, and system, and more particularly, to a method, medium, and system in which an event candidate section is detected based on audio information, a final event section is detected from the detected event candidate section based on visual information, and video abstract information of the detected final event section is generated.
2. Description of the Related Art
As a conventional method of summarizing the content of a provided video, U.S. Patent Publication No. 2004/0017389 discusses summarizing such content by events within the video, e.g., using replay, a live event, and a setup event. However, in this conventional technique, visual information and audio information are individually processed to summarize video data, resulting in slow processing speeds.
Similarly, as a multimedia content indexing method, U.S. Pat. No. 6,714,909 discusses, with respect to the content of a provided news video, multi-modal information such as visual, audio, and text is used for generating news content information at a high level, such as abstract generation, speaker recognition, and subject recognition. However, again, this conventional multimedia content indexing technique also individually processes visual information and audio information to generate abstract information, resulting in slow processing speeds.
Thus, as described above, in conventional multi-modal information based summary techniques, since visual information and audio information are individually processed for multi-modal processing and an abstract is generated by integrating a result of the processing, the processing thereof takes a relatively long time. Namely, in conventional multi-modal information based summary techniques, and particularly in the case of summarizing based on visual information of HD video, for example, since two hours of HD images of visual information has approximately 15 GB to 18 GB of video data, a very large processing capacity is required for the processing of video information. Therefore, in these conventional techniques, the speed of detecting an event and generating an abstract becomes notably time consuming, which the present inventors have found to be an undesirable hindrance to the field.
Thus, the present inventors believe there is a need to improve processing speeds for generating video abstracts.