An increasing number of people own and use camcorders to make videos that capture their experiences and document events in their lives. One of the primary problems with consumer home video acquisition devices such as camcorders is that they are linear-based devices and a single recording, either digital or on tape, may contain multiple “events” (e.g. birthday party, soccer game, vacation video, etc.). Each event may in turn consist of multiple “shots” (i.e. the sequence of contiguous video frames between the time when the camera is instructed to start recording and when it instructed to stop recording). Moreover, each shot may consist of one or more scenes. Unfortunately, the linear nature of typical video recordings often makes it difficult to find and play back a segment of the video showing a specific event, scene, or shot.
It is usually more convenient to the user if a long video can be divided into a number of shorter segments and the user is allowed to access those segments directly. Ideally the video should be divided at the points where natural discontinuities occurs. Natural discontinuities include discontinuities in time (e.g., gaps in the recorded DV time code) as well as discontinuities in content (e.g., scene changes). If the recording is continuous on a digital video (DV) tape, for example, the time code should increment by a predictable value from frame to frame. If the recording is not continuous, (e.g., the user stops the recording then records again later), then there will be a gap in the time code that is larger than the normal frame-to-frame increment. Such gaps correspond to discontinuity points in time. Similarly, if there is no sudden motion or lighting change, the video content would remain generally continuous as well. A sudden change in the video content may suggest the occurrence of some event in the video. Such sudden changes would correspond to discontinuity points in content. A time- or content-based discontinuity point in a video is sometimes referred to as a shot boundary, and the portion of a video between two consecutive shot boundaries is considered to be a shot.
Known video playback, browsing and editing applications, such as multimedia editing applications (MEAs), allow a user to bring versatility to such linear video recordings via a personal computer by allowing the user to capture or transfer the video onto the computer and then to manually segment the digital video file into events of the user's choosing. Some MEAs make this easier for the user by attempting to automatically detect shot boundaries within a particular video file. Thereafter, the MEA may segment the video file into shots that are displayed in a library to allow the user to manually select shots and combine them to form recordings of events of the user's choosing.
Conventional MEAs use various methods to detect shot boundaries within a particular video. Unfortunately, as known to those skilled in the art, these applications are unable to achieve the desired level of performance in both precision and recall for a wide range of videos.