This invention relates to a method of processing a video stream, to detect changes, for example, a cut in scenes.
In video terminology, a video stream consists of a number of frames that are displayed successively to create the illusion of motion. A sequence of frames can be considered to form a xe2x80x9cscenexe2x80x9d, which is considered to be a continuous action in space and time (i.e. with no camera breaks). A xe2x80x9ccutxe2x80x9d is a discontinuity between scenes. A cut is sharp if it can be located between two frames and gradual if it takes place over a sequence of frames. A keyframe is a frame that represents a whole scene. It can either be calculated or selected from the frames of the scene it represents.
There are many situations where it is desirable to select a cut. For example, selecting keyframes to transmit over a network, save onto a hard disk, or use to browse a video can reduce bandwidth, capacity and time than considering the whole video data. However, video segmentation is a difficult process in view of the various types of camera breaks and different operations that can take place.
Video parameters include intensity, red-green-blue (RGB), hue-value-chroma (HVC), and a motion vector. A traditional approach for detecting a cut is to compare one or more of these parameters, such as intensity, of the corresponding pixels in a pair of consecutive frames. If the number of pixels whose intensity values have changed from one frame to the next exceeds a certain threshold, a cut is presumed. However, such an approach results in low detection rates and result in the detection of false cuts or missing real cuts. False cuts may result from camera operations, object movements or flashes within a video clip, while missed cuts may result from gradual scene changes.
EP 696 01 6A describes a cut detection method wherein a scene changing ratio is computed taking into account the frame difference between temporally spaced images as well as temporally successive images. EP 660327describes a method for detecting abrupt and gradual scene changes wherein matching is performed between a current frame and a Dth previous frame. Neither of these patents satisfactorily solves the problems outlined above.
An object of the invention is to alleviate the afore-mentioned problems.
According to the present invention there is provided a method of processing a video stream, comprising the steps of selecting first pairs of frames in the video stream with a predetermined temporal spacing; selecting second pairs of frames in the video stream, said second pairs of frames having a longer temporal spacing than said first pairs of frames; for each of said first and second pairs of frames, determining a difference value representing the degree of change between the first and second frames of the pair and generating a particular logic level depending on whether this difference value exceeds a predetermined threshold; determining the change in interframe difference value for successive pairs of frames for each of said first and second pairs of frames and comparing said change with a threshold to generate additional logic levels dependent on the change in interframe difference values for said successive frame pairs; and comparing the generated logic levels are compared with a decision map to identify cuts in the video stream.
The degree of change may be represented by the number of pixels for which a particular value, such as intensity, has changed. Alternatively, the difference value may be arrived at by, for example, taking the root mean square of the differences in pixel values. In this case, the difference in intensity value of each corresponding pair of pixels is determined, the results squared, and the square root taken of the sum. This rms value can then be compared to a threshold. A value other than intensity, for example hue, can be chosen for the value.
By this method, gradual cuts between scenes can be more accurately detected and the occurrence of false detections can be reduced.
In a preferred embodiment, the change in difference value between each of the first and second pairs of frames and the corresponding previous pairs is determined, and additional logic levels are generated that depend on whether the change in difference values exceeds a predetermined threshold. The additional logic levels are also compared with the decision map to assist in identifying the cuts. This additional step enhances the detection process.
The invention also provides video processing apparatus comprising means for selecting first pairs of frames in the video stream with a predetermined temporal spacing; means for selecting second pairs of frames in the video stream, said second pairs of frames having a longer temporal spacing than said first pairs of frames; means for determining, for each of said first and second pairs of frames, a difference value representing the degree of change between the first and second frames of the pair and generating a particular logic level depending on whether this difference value exceeds a predetermined threshold; characterized in that it further comprises means for computing the change in interframe difference value for successive pairs of frames for each of said first and second pairs of frames and comparing said change with a threshold to generate additional logic levels dependent on the change in interframe difference values for said successive frame pairs, and means for comparing the generated logic levels with a decision map to identify cuts in the video stream.