Video photography is becoming increasingly more popular among users as the cost of digital video cameras continues to drop. A user typically will use their digital video camera to capture memorable moments, such as, for example, a wedding or a vacation. Although digital video cameras are quite popular, analog video cameras can also be used to make videos that can be processed using digital image processing techniques. This involves a conversion from analog videos into a digital medium for processings.
Video generally contains a great deal of data. Much of this data, however, in terms of content, is redundant. Redundancy occurs because video uses a high frame rate (30 frames/second) to please the human eye. The human brain, however, can capture the same content with a much lower frame rate.
It is often desirable for a user to be able to quickly locate a specific section of video. By way of example, a video may contain portions of a vacation and a wedding. For the wedding sequence, the user may want to find the section of the video where the cake is being cut. Manually searching and analyzing the entire video can be tedious for the user because even short videos typically contain a high number of frames. Thus, if the user wanted to locate the wedding cake sequence or shot from an hour-long video, this would require analyzing and viewing over 100,000 frames.
In order to ease the task of locating a desired video shot or sequence (or simply, a video sequence), key frames can be used. Key frames are quite useful in aiding a user in identifying desired portions of a video. In general, key frames are selected frames of the video that are representative of the content of a video sequence. Key frames are the video equivalent of an index of a book. While the book index contains keywords referenced by a page number, video key frames are frames of the video that are representative of the material contained on the video. If a reader of the book desires to find information contained in the book about a particular subject or term, the user looks in the index. Similarly, user can find a particular subject contained in the video by searching the key frames of the video.
One problem current key frame selection techniques is that there is no agreement on how to choose the “best” key frame for a video sequence. By “best”, it is meant the frame contained in the video sequence that is most representative of the video content of the video sequence. This is because selecting the “best” key frame is subjective. Some techniques select the middle frame of a video sequence, others select the first frame, while still others select the last frame. Another problem with current key frame selection techniques is that there is no agreement on the number of key frames that should be used to represent the video content of a video sequence.
Many of the existing key frame selection techniques use a threshold approach. In general, the threshold approach states that if a property (such as motion) of a frame within a video sequence is above a certain threshold amount, then the frame is considered as a key frame. One problem with the threshold approach is that the threshold must be constantly adjusted and fine tuned based on variables such as video content, camera types, and camera compression. For instance, one portion of a video may contain content that includes a sleeping baby, while another portion may contain high-action content such as a soccer game. Although a threshold can be fine-tuned for a specific type of video content, when another type of video content is being analyzed the threshold must be fine-tuned afresh. This requires tedious and time-consuming threshold fine tuning. Therefore, there exist a need for a keyframe extraction technique that provides a more uniform and robust approach to the selection of video key frames.