MPEG-based video sequences can be divided into scenes that may vary from less than a second to minutes or more in length. Detection of these divisions or scene cuts makes possible a wide variety of value added features. For example, a frame can be selected from each scene to create a storyboard summarizing the contents of a video recording. Further, a video editor can be used to manipulate the video recording on a scene-by-scene basis, for example, re-ordering scenes or applying image-processing techniques to each frame in a scene.
MPEG video sequences include three types of frames, Intra-Frames (I), Inter-Frames (P), and Bi-Directional Frames (B). I frames encode a still image using a method similar to JPEG encoding. P frames are predicted from a previous I or P frame. B frames are predicted both from a previous I or P frame and a next I or P frame. These three types of frames are encoded using a Discrete Cosine Transform (DCT), which organizes redundancy in spatial directions between frames. However, for I frames, the DCT information is derived directly from an image sample, whereas for the P and B frames, the DCT information is derived from a residual error after prediction.
Each frame is divided into a plurality of macroblocks. Each macroblock includes information related to a plurality of luminance blocks, e.g., Y1, Y2, Y3 and Y4, and a plurality of chrominance blocks, e.g., one U and one V in a YUV system. Each of these blocks includes a plurality of pels, or picture elements, e.g., an 8×8 block.
When video has been encoded into an MPEG-like bitstream, scene cut detection can be done without fully decoding the bitstream, which increases the speed of video processing. Additional information, such as, macroblock encoding types, motion vectors, and DCT coefficients, may also be extracted from the bitstream without fully decoding the bitstream.
One method for scene cut detection is performed as follows:                1. For I frames, mean-square differences between DCT coefficients are determined;        2. For P frames, the proposed method determines the number of forward-predicted macroblocks;        3. For B frames, the lesser of the number of forward-coded macroblocks and the number of backward-coded macroblocks is counted; and        4. A minimum is then determined in a plot of these numbers versus frame number.        
In another proposed method for scene cut detection: for I frames, a difference between color histograms built from DC coefficients is used, and combined with information about the ratio of the number of macroblocks without motion compensation to the number with motion compensation. The proposed method looks for a peak in a plot versus frame number. For B frames, the ratio of forward to backward predictions is determined. In all cases, a local adaptive threshold technique is implemented to identify peaks.
Yet another method makes use of histograms for all frames (I, P, and B) built from Discrete Cosine (DC) coefficients with motion compensation.
However, no known system or method currently exists for scene cut detection based on global examination of all of the predictions within a subgroup of pictures (GOP).