During the production of a media, such as a news video, a press report, a commercial and a film, a tremendous amount of various types of media data are generated and subsequently edited. The management of such media data is essential and crucial for the production procedure and the efficiency thereof. One common approach for the management is the utilization of metadata, which could be simply defined as data about data. A metadata item describes individual information of the media data or is a collection of varied information.
Among various types of metadata, temporal metadata, which describes the temporal features of media data, is an important one and is often used. In the case of a video data, the temporal segmentation metadata that clarifies the structure of a video is especially useful for the management and arrangement of the video. The temporal segmentation metadata can usually be acquired by detection of shot boundaries in a video, which can be accomplished by various known techniques in the field. A shot boundary is a cut or a fade in the video and defines a shot between two shot boundaries. Several shots located at a same set can be grouped together and become a scene of a video. The structure of a video is generally described by such shots and scenes.
For the detection of shot boundaries in a video, a satisfied result can be generally acquired by existing techniques. For example, shot metadata can be generated from techniques such as Edit Decision List (EDL). However, the detection of scenes in a video is usually with many errors which bring trouble and need to be corrected. In addition, there are cases where metadata about the temporal structure of the video is missing and thus has to be generated. For example, in the situations when digitizing analog video archives, when the target videos are without metadata about the temporal structure, or when the temporal metadata are lost during production.