In the video field, a scene (also called a shot) can be defined by a set of contiguous consecutive images taken by a single camera and that forms a continuous action in time and space. The passage from one scene to another is called a transition. Video editing tools make it possible to create varied transition effects. The most common are abrupt transitions (see example in FIG. 1), fades (see example in FIG. 2) and cross-fades (see example in FIG. 3). The last two effects are also called gradual transitions. By contrast with the abrupt transition, which links the last image of a shot with the first of the next shot with no particular effect, gradual transitions are progressive and occur over several successive images.
For a video encoder, the transitions can present difficulties. The efficacy in compression of MPEG-type video codecs results from the use of the strong correlation existing between the successive images of the sequences. The transitions significantly reduce, and even eliminate, this correlation. This results in a much higher coding cost. In addition, visually disruptive artefacts can occur. In this context, the automatic identification of transitions makes it possible to parameterise the video encoders so as to maximise the quality/speed ratio.
Numerous transition detection methods have been proposed. Generally, the targeted application is not video coding, but video analysis. One of the most common applications is automatic indexing of video content, which requires, as the preliminary process, the cutting of video sequences into scenes. The following references can be cited:
[1] Lienhart R., “Comparison of Automatic Shot Boundary Detection Algorithms”, Proc. Image and Video Processing VII 1999, SPIE 3656-29, January 1999.
[2] Bescos J, Cisneros G, Martinez J, Menendez J, Cabrera J, “A Unified Model for Techniques on Video-Shot Transition Detection”, IEEE Transactions on Multimedia, Vol. 7, No. 2, April 2005, pp. 293-307.
[3] Covell, et al. “Video processing system including advanced scene break detection methods for fades, dissolves and flashes”, patent U.S. Pat. No. 6,721,361 (Apr. 13, 2004).
[4] Bozdagi, et al. “Feature based hierarchical video segmentation”, patent U.S. Pat. No. 6,493,042 (Dec. 10, 2002).
[5] Shin, et al. “High accurate and real-time gradual scene change detector and method thereof”, patent U.S. Pat. No. 6,381,278 (Apr. 30, 2002).
[6] Shahraray “Method and apparatus for detecting abrupt and gradual scene changes in image sequences”, patent U.S. Pat. No. 6,055,025 (Apr. 25, 2000).
The methods cited above all have limitations with respect to the desired application of the present invention, namely video coding. These limitations concern the reliability, complexity or delay of processing. By complexity, we mean the computing cost of the method. The delay is not necessarily associated with the complexity. It represents a latency time between the time at which an image is received and the time at which the method delivers a result concerning this image. The delay is generally dependent on the number of images needed to perform the calculations.
The methods presented in [1] are unreliable. The method presented in [2] is very reliable, but much more complex. The methods presented in [3], [4], [5] and [6] have a more reasonable complexity, but may involve a significant delay. Thus, the delay caused by [3] and [5] is at least equal to the time of the gradual transitions detected. The method presented in [4] functions in two passes (this approach is very common), which is unacceptable for the desired application. Finally, in [6], the efficacy of the method is based on the artificial introduction of a delay. In addition, the use of a time filtering step can also increase this delay.