Selection of key video frames is useful in many applications. For example, it is often desirable to extract and present some subset of video data that can convey an accurate and recognizable summary or synopsis of the video. Key frame extraction algorithms are used to select a subset of the most informative frames from a video, with the goal of representing the most significant content of the video with a limited number of frames. Key frame extraction finds applications in several broad areas of video processing such as video summarization, creating chapter titles in DVDs, video indexing, and making prints from video. Summaries or synopses can also facilitate video sharing or help a user decide whether a full video is worth downloading or viewing. Key frame extraction is an active research area, and many approaches for extracting key frames from videos have been proposed.
Algorithms for creating a video summary by extracting key video frames are known in the art, but have shortcomings that are addressed by the present invention. Existing algorithms, such as that disclosed in U.S. Pat. No. 8,599,313 to Aaron T. Deever, which determines key video frames based primarily on inter-frame motion detection, suffer from at least two shortcomings. These algorithms either do not consider quality metrics to aid in the selection of key frames, or require extensive joint optimization of multiple metrics, an approach that is computationally expensive.
For instance, the method of U.S. Pat. No. 7,889,794 to J. Luo, et al., entitled Extracting key frames candidates from video clip, analyzes a video clip to determine key frames by performing a global motion estimate on the video clip that indicates translation of a scene or camera. As an additional example, U.S. Pat. No. 7,184,100, to I. Wilf, et al., entitled Method of selecting key-frames from a video sequence, teaches the selection of key frames from a video sequence by comparing each frame in the video sequence with the adjacent frames using both region and motion analysis.
The prior art methods do not include or combine other non-motion-based metrics, such as image quality or semantic content of the video frames, to improve the quality of the key frame selection process. Integrating such metrics into these methods would require a new complex and time consuming optimization process. Hence there is a need to develop new strategies to improve the current algorithms using additional quality metrics. It is one object of the present invention to select key frames that are perceptually better than the key frames selected by conventional, motion-based methods, with a slight impact on computational cost. It is a further object of the present invention to improve existing algorithms by incorporating new metrics without the need for new optimization of the algorithm or normalization of new features.