1. Field of the Invention
The invention is generally related to digital image processing, and, in particular, is related to detecting video shot boundaries.
2. Description of Related Art
Video cameras are becoming more popular today, as they become more widely available, at lower prices. A video camera records sequential images within “frames.” A frame is a representation of an image at an instant of time. Typically, each frame represents the image at a different instant in time. When several frames are recorded, at sequential instances in time, and are shown to the human eye in quick succession, the human eye is able to see motion in the video segment (i.e., a sequence of frames). For example, video (i.e., moving pictures) normally consists of a lot of motion, including object motion, such as a bird flying, and camera motion, such as camera panning, zooming, and tilting.
For various types of video processing (e.g., to enable video classification to search for videos or to enable searching for a video segment within a video), it is useful to segment a video into physical units, which are referred to as “shots.” A shot is a video segment that represents one continuous action. Shots may be clustered to form more semantically significant units, such as scenes or sequences. These scenes may then be used for story-based video structuring (e.g., scenes may be organized into a movie format). The shots may be described by one or more representative frames, which may be referred to as key frames. Once key frames are identified, the key frames may be used to classify videos, which enables searching for a particular video (e.g., when renting videos), or may be used for searching for a particular video segment (e.g., a video segment that shows a bird flying) within a video. In one embodiment, a shot is an unbroken sequence of frames captured with one camera, and a shot boundary is the border between two shots. A shot boundary may occur at an abrupt break, which appears as an instantaneous change, or at a gradual transition. One type of gradual transition is a fade in, fade out effect in which the camera focus changes from, for example, a building scene, to, for example, a person scene. Then, during the gradual transition, the building may appear to fade out, while the person may appear to fade into the image.
In some cases, individuals will review a video and manually select shots and representative key frames. This is a very time consuming process. Alternative techniques offer automated processes for identifying shots and selecting key frames, but these techniques typically also find many false alarms (i.e., they identify a pair of frames as a shot boundary when it is not actually a shot boundary). One example is the step variable technique described in “Efficient Scene Change Detection and Camera Motion Annotation for Video Classification,” by Wei Xiong and John Chung-Mong Lee, Computer Vision and Image Understanding, Vol. 71, No. 2, pp. 166–181, August 1998 and “Automatic Video Data Structuring Through Shot Partitioning and Key-Frame Computing,” Wei Xiong, John Chung-Mong Lee, and Rui-Hua Ma, Machine Vision and Applications, Springer-Verlag, 10: 51–65, 1997, each of which is entirely incorporated by reference herein.