Video is an effective way to capture a scene or an unfolding event. People often capture videos for birthday parties, weddings, travel and sports events. Unlike still images, video has an advantage of capturing evolving, unstructured events, such as particular natural facial expressions and human interactions (e.g. talking, mutual smiling, kissing, hugging, handshakes). It is often desirable to select individual frames from a sequence of video frames for display or for use as content in printed books in the same way as still images are used. In addition, sub-sections of the video sequence, known as segments, can be selected to be displayed as a summary representation of the video sequence. A video segment comprises a series of sequential video frames of a video sequence.
With increasing demand and accessibility of mobile phones and other consumer oriented camera devices, more and more video data is being captured and stored. Hence, it is increasingly more difficult to find the relevant videos and/or to extract desirable frames of the videos for printing or display.
One method of selecting video frames determines desirable video segments or frames solely based on image quality measures including photographic composition, colour distribution, blur, colour contrast, sharpness and exposure. Instead of performing image analysis directly on portable devices, an alternative method of selecting video frames or segments uses camera specific parameters such as aperture, shutter speed, ISO, types of lens and camera motion. More recent methods of selecting video frames attempt to extract high-level semantics from videos to facilitate video segment and image selection by identifying faces, objects, types of events, and human activities. In particular, for wedding and sporting videos, some methods detect camera flashes and audio features (e.g. music, applause and cheers) to identify important scenes, objects, and events in a video sequence.