Widespread proliferation of personal video cameras has resulted in an astronomical amount of uncompelling home video. Many personal video camera owners accumulate a large collection of videos documenting important personal or family events. Despite their sentimental value, these videos are too tedious to watch. There are several factors detracting from the watchability of home videos.
First, many home videos are comprised of extended periods of inactivity or uninteresting activity, with a small amount of interesting video. For example, a parent videotaping a child's soccer game will record several minutes of interesting video where their own child makes a crucial play, for example scoring a goal, and hours of relatively uninteresting gameplay. The disproportionately large amount of uninteresting footage discourages parents from watching their videos on a regular basis. For acquaintances and distant relatives of the parents, the disproportionate amount of uninteresting video is unbearable.
Second, the poor sound quality of many home videos exacerbates the associated tedium. Well-produced home video will appear amateurish without professional sound recording and post-production. Further, studies have shown that poor sound quality degrades the perceived video image quality. In W. R. Neuman, “Beyond HDTV: Exploring Subjective Responses to Very High Definition Television,” MIT Media Laboratory Report, July 1990, listeners judged identical video clips to be of higher quality when accompanied by higher-fidelity audio or a musical soundtrack.
Thus, it is desirable to condense large amounts of uninteresting video into a short video summary. Tools for editing video are well known in the art. Unfortunately, the sophistication of these tools make it difficult to use for the average home video producer. Further, even simplified tools require extensive creative input by the user in order to precisely select and arrange the portions of video of interest. The time and effort required to provide the creative input necessary to produce a professional looking video summary discourages the average home video producer.
In order to relieve the burden of editing video, many techniques have been proposed for automatically creating video summaries. However, these techniques are unsuitable for home video. In Christel, M., Smith, M., Taylor, C., and Winkler, D., “Evolving Video Skims into Useful Multimedia Abstractions,” Human Factors in Computing Systems, CHI 98 Conference Proceedings (Los Angeles, Calif.), New York: ACM, pp. 171–178, 1998; Pfeiffer, S., Lienhart, R., Fischer, S., and Effelsberg, W., “Abstracting Digital Movies Automatically,” Journal of Visual Communication and Image Representation, 7(4), pp. 345–353, December 1996; and Smith, M., and Kanade, T., “Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques,” Proc. ComputerVision and Pattern Recognition, pp. 775–781, 1997, a text transcription of the video is used to determine video segments for video summaries. In home video, text transcription is normally unavailable.
Lienhart, R., “Abstracting Home Video Automatically,” Proc. ACM Multimedia '99(Part2), pp.37–40, 1999, creates video digests by selecting portions of video shots with good quality and concatenating the selected portions. Audio considerations are not addressed.
In Suzuki, R. and Iwadate, Y., “Multimedia Montage—Counterpoint Synthesis of Movies,” Proc. IEEE Multimedia Systems '99, Vol. 1, pp. 433–438, 1999, the authors describe video editing tools for composing movies using heuristics derived from music theory. With these video editing tools, the resulting footage is well synchronized with sound. However, these video tools do not operate automatically; the user must manually edit the video.
It is desirable to have a method for producing video summaries that 1) accurately and concisely summarizes a longer video recording; 2) provides a compelling video presentation; 3) produces a professional looking video presentation; 4) reduces or eliminates the detrimental effects of poor quality audio; and 5) produces a video summary automatically, with little or no user input required.