A video sequence is made from a set of video frames having a certain temporal distance between the frames. As part of a given ‘scene,’ the video frames within it have a temporal coherence. Scene cuts are introduced at various points in the video sequence due to a number of factors such as video editing effects, changing camera angles, and artistic effects, etc., as well as due to scene changes themselves.
FIG. 1 illustrates a series of video frames that contains two scenes in a video in which the scene change from the first scene to the second scene is abrupt. In an abrupt scene change, the frames on either side of the scene changes are completely different. For example a first scene of a video may be a person approaching a building from the outside, then the video changes scenes to a view of the person entering the building from the inside. In this instance, the last frame of the first scene and the first frame of the second scene are radically different. Other scene changes happen gradually over a number of video frames. These gradual scene changes may include fades, wipes, dissolves, etc. Embodiments of the invention are directed to the former types of scene changes.
Several approaches exist for detecting scene changes, but they may be broadly classified into one of two categories—those that analyze compressed video streams and those that analyze uncompressed video streams. The latter type is also called analysis in the uncompressed pixel domain, and is the category to which the invention is directed.
One popular method of detecting scene changes in uncompressed video streams is to use an intensity histogram. In this method the histogram difference between two consecutive video frames is computed. This difference is then compared against a threshold to decide whether a scene cut occurred between the two frames. A potential drawback of such an approach is how to choose the threshold. Sometimes a global threshold applied to all of the frames across a video sequence would yield better results, but other times a local threshold would be better. Another possible limitation with the histogram approach is that it has difficulty detecting between two images that have different structure but similar pixel values.
Embodiments of the invention address these and other limitations of the prior art.