The present invention is related to an apparatus which detects significant scenes of a source video and selects keyframes to represent each detected significant scene. The present invention additionally filters the selected keyframes and creates a visual index or a visual table of contents based on remaining keyframes.
Users will often record home videos or record television programs, movies, concerts, sports events, etc. on a tape for later or repeated viewing. Often, a video will have varied content or be of great length. However, a user may not write down what is on a recorded tape and may not remember what she recorded on a tape, DVD or other medium or where on a tape particular scenes, movies, or events are recorded. Thus, a user may have to sit and view an entire tape to remember what is on the tape.
Video content analysis uses automatic and semi-automatic methods to extract information that describes contents of the recorded material. Video content indexing and analysis extracts structure and meaning from visual cues in the video. Generally, a video clip is taken from a TV program or a home video.
In U.S. Ser. No. PHA 23252, of which the present application is a continuation in part thereof, a method and device is described which detects scene changes or xe2x80x9ccutsxe2x80x9d in the video. At least one frame between detected cuts is then selected as a key frame to create a video index. In order to detect scene changes a first frame is selected and then a subsequent frame is compared to the first frame and a difference calculation is made which represents the content difference between the two frames. The result of this difference calculation is then compared to a universal threshold or thresholds which is/are used for all categories of video. If the difference is above the universal threshold(s) it is determined that a scene change has occurred.
In PHA 23252 a universal threshold(s) is/are chosen which is/are optimal for all types of video. The problem with such an application is that a visual index of a video which contains high action, such as an action movie, will be quite large, whereas a visual index of a video with little action, such as the news will be quite small. This is because in a high action movie, where objects are moving across a scene, the content difference between two consecutive frames may be large. In such a case, comparing the content difference to a universal threshold will result in a xe2x80x9ccutxe2x80x9d being detected even though the two frames may be within the same scene. If there are more perceived cuts or scene changes then there will be more key frames and vice versa. Accordingly an action movie ends up having far too many key frames to represent the movie.
Accordingly it is an object of the invention to provide a system which will create a visual index for a video source which was previously recorded or while being recorded, which is useable and more accurate in selecting significant keyframes by varying the number of keyframes chosen from a video based on the category of the video.
The present invention further presents a video analysis system supporting visual content extraction for source video which may include informative and/or entertainment programs such as news, serials, weather, sports or any type of home recorded video, broadcast material, prerecorded material etc.
The present invention further presents a new apparatus for video cut detection, static scene detection, and keyframe filtering to provide for more useable visual images in the visual index by comparing two frames of video and determining whether the differences between the frames are above or below a certain threshold, the threshold being dependent on the category of the video. If the differences are above the selected threshold then this point in the video is determined to be a scene xe2x80x9ccutxe2x80x9d. The frames within the scene cuts can then be filtered to select key frames for a video index. If the differences are below a static scene threshold for a number of frames it is determined to be a static sequence.
The present invention may detect significant scenes and select keyframes of source video, based on calculations using DCT coefficients and comparisons to various thresholds where the thresholds vary based on the category of video, e.g. action, news, music, or even the category of portions of a video.
Additionally it is an object of the invention to calculate these thresholds based on a video category provided by electronic program guides, or alternatively instead of calculating the thresholds the electronic program guide provides the thresholds themselves.
Furthermore it is an object of the invention to enter these thresholds manually.
It is even another object of the invention to provide the thresholds in the encoded video.