In this Internet era, the amount of digital video data that consumers have access to are increasing by leaps and bounds. This video data can be in the form of commercial DVDs and VCDs, personal camcorder recordings, off-air recordings onto HDD and DVR systems, video downloads on a personal computer or mobile phone or PDA or portable player, and the like. To manage these video libraries, new automatic video management technologies are being developed that allow users efficient access to their video content and functionalities such as video categorization, summarization, searching and the like.
The realization of such functionalities relies on analysis and understanding of individual videos. First step of the analysis includes segmentation of the video into its constituent shots. A shot can be defined as a sequence of video frames obtained by one camera without being interrupted. The video is generally configured of a connection of many shots and various video editing effects are used according to methods of connecting the shots. For example, an hour of a TV program typically contains 1000 shots. The video editing effects include an abrupt shot transition and a gradual shot transition. The abrupt shot transition (generally referred to as hard cut) is a technique that the current picture is abruptly changed into another picture. The gradual shot transition is a technique that a picture is gradually changed into another picture such as fade, dissolve, wipe and other special effects.
A common example of the gradual shot transition is the fade, whereby intensity of a shot gradually drops, ending at a black monochromatic frame (fade-out), or the intensity of a black monochromatic frame gradually increases until actual shot becomes visible at its normal intensity (fade-in). Fades to and from black are more common, but fades involving monochromatic frames of other colors are also used. Another example of the gradual shot transition is the dissolve, which can be envisaged as a combined fade-out and fade-in. The dissolve involves two shots, overlapping for a number of frames, during which time the first shot gradually dims and the second shot becomes gradually more distinct.
Presently, various methods and systems are used for identifying and utilizing the different frame characteristics of a video sequence such as Scene Change, the Fading in, the Fading out and the Dissolve. Some of the present methods and systems use comparisons of segmentation mask maps between two successive video frames. In addition, object tracking technique is employed as a complement to handle situations of scene rotation without any extra overhead. Some methods and systems use a two-phase reject-to-refine strategy. According to this strategy, the frames are tested against mean absolute frame differences (MAFD) with a relaxed threshold. Then, these frames are further examined by combined metrics of signed difference of mean absolute frame difference (SDMAFD), absolute difference of frame variance (ADFV), and MAFD after normalization. This approach can be referred to as a histogram equalization process. Some other methods and systems combine the intensity and motion information to detect the scene changes. Most of these approaches have higher overhead. Further, these methods and systems are complex and not very effective in detecting the scene changes.
In light of the above discussion, there is a need for a method and system which overcomes all the above stated disadvantages.