1. Field of Invention
The invention pertains to video processing, and more specifically to determining scene changes between video frames.
2. Description of Related Art
Motion picture video content data is generally captured, stored, transmitted, processed, and output as a series of still image frames. Small frame-by-frame data content changes are perceived as motion when the output is directed to a viewer at sufficiently close time intervals. A large data content change between two adjacent frames is perceived as a scene change (e.g., a change from an indoor to an outdoor scene, a change in camera angle, an abrupt change in illumination within the image, and so forth).
Encoding and compression schemes take advantage of small frame-by-frame video content data changes to reduce the amount of data needed to store, transmit, and process video data content. The amount of data required to describe the changes is less than the amount of data required to describe the original still image. Under standards developed by the Moving Pictures Experts Group (MPEG), for example, a group of frames begins with an intra-coded frame (I-frame) in which encoded video content data corresponds to visual attributes (e.g., luminance, chrominance) of the original still image. Subsequent frames in the group of frames (predictive coded frames (P-frames); bi-directional coded frames (B-frames)) are encoded based on changes from earlier frames in the group. New groups of frames, and thus new I-frames, are begun at regular time intervals to prevent, for instance, noise from inducing false video content data changes. New groups of frames, and thus new I-frames, are also begun at scene changes when the video content data changes are large because fewer data are required to describe a new still image than to describe the large changes between the adjacent still images. Therefore, during video content data encoding, it is important to identify scene changes between adjacent video content data frames.
Several schemes exist to identify scene changes between two video content data frames. Motion-based schemes compare vector motion for blocks of picture elements (pixels) between two frames to identify scene changes. Histogram-based schemes map, for instance, the distribution of pixel luminance data for the two frames and compare the distributions to identify scene changes. Discrete cosine transform-(DCT-)based schemes map pixel data to a frequency domain distribution for the two frames and compare the distributions to identify scene changes. Motion-, histogram-, and DCT-based schemes require a relatively high data processing power (typically measured in millions of instructions per second (MIPS)) since a large average number of instructions per pixel is required to carry out these schemes.
Another scheme to identify scene changes is to determine a pixel-by-pixel video content data difference between two frames. For one frame (n) composed of pixels P(n) and another frame (n+1) composed of pixels P(n+1), the aggregate difference is given by:Σ(|P(n)−P(n+1)|)which is compared to a threshold value. Aggregate differences more than the threshold value are considered scene changes. An advantage of this frame difference scheme is that a relatively low number of data processing instructions per pixel is required (e.g., an average of 3-4 instructions per pixel), and therefore the frame difference scheme requires relatively fewer MIPS than the motion-, histogram-, and DCT-based schemes described above.
The low MIPS requirement of this approach allows a data processor running at a moderate clocking rate, for example, 40 MHz, to perform real-time video content data encoding. However, a significant disadvantage of the frame difference scheme is that large changes in only a small portion of the first frame are falsely identified as scene changes. In practice, this frame difference scheme yields a large number of false scene change indications, wherein the efficiency of video content data compression suffers.
Therefore, a need exists for a video content data encoding scheme that accurately identifies scene changes so as to maximize video content data compression, while requiring only a small per-pixel instruction overhead. The present invention fulfills those needs and others, while overcoming the inherent drawbacks of previous approaches.