For many years, an increasing number of people own and use video recorders to make video movies that capture their experiences and document their lives. Oddly, most videos are put into a storage box and rarely watched again.
Research is growing in the field of video abstracting. Video abstracting is the processes of taking unedited video footage and combining shorter segments of that footage into one abstract. Existing automatic video abstracting systems concentrate on feature films, documentaries or newscasts. Currently, there are generally three systems that produce videos as abstracts. The first is called video skimming. It aims mainly at abstracting documentaries and newscasts. Video skimming assumes that the audio track transcript is available. The video and the transcript are then aligned by word spotting. The audio track of the video skim is constructed by using language analysis (such as the Term Frequency Inverse Document Frequency measure) to identify important words in the transcript. Audio clips around those words are then cut out. Based on detected faces, text, and camera operations, video clips for the video skim are selected from the surrounding frames.
The second system called MoCA Abstracting. MoCA Abstracting was explicitly designed to generate trailers of feature films. The MoCA Abstracting system performs an extensive video analysis of a feature film to segment it into shots or scenes and to determine special events, such as text appearing in the title sequence, close-up shots of main actors, explosions, gunfire, etc. This information is used to select the clips for the video abstract. During the final assembly, ordering and editing rules are presented. Since MoCA Abstracting relies highly on special events such as explosions, gunfire, shot or reverse shot dialogs, and actions that are usually not present in home videos it cannot be used to abstract home video.
The third system by Saarela and Merialdo does not perform any automatic content analysis. Instead they assume that videos have been annotated manually or automatically by descriptors for various properties and relations of audio and video segments. Based on those descriptors the authors try to define “optimal” summaries. They present constraints for video summaries and methods to evaluate the importance of a specific segment.
These existing automatic video abstracting systems concentrate on feature films, documentaries or newscasts. Since raw video footage such as home video is inherently different from all broadcast video, new abstracting principles and algorithms are needed.