An exemplary device for extracting features from moving images and collating them is described in “Video Signature Robust to Caption Superimposition for Video Sequence Identification” (Non-Patent Document 1). FIG. 7 is a block diagram showing the device described in Non-Patent Document 1.
A block unit feature extraction unit 1000 extracts features in block units from a first video to be input, and outputs a first feature to a matching unit 1030. Another block unit feature extraction unit 1010 extracts features in block units from a second video to be input, and outputs a second feature to the matching unit 1030. A weighting coefficient calculation unit 1020 calculates a weighting value of each of the blocks based on a learning video to be input, and outputs a weighting coefficient to the matching unit 1030. The matching unit 1030 compares the first feature output from the block unit feature extraction unit 1000 with the second feature output from the block unit feature extraction unit 1010 using the weighting coefficient output from the weighting coefficient calculation unit 1020, and outputs the matching result.
Next, operation of the device shown in FIG. 7 will be described.
In the block unit feature extraction unit 1000, each of the frames of the input first video is divided into blocks, and a feature for identifying the video is calculated from each block. Specifically, the block unit feature extraction unit 1000 determines the type of the edge for each block, and calculates the type as a feature of each block. Then, for each of the frames, the block unit feature extraction unit 1000 forms a feature vector configured of the edge types of the respective blocks. Then, the block unit feature extraction unit 1000 calculates the feature vector of each of the frames, and outputs the acquired feature to the matching unit 1030 as the first feature.
Operation of the block unit feature extraction unit 1010 is the same as that of the block unit feature extraction unit 1000. The block unit feature extraction unit 1010 calculates the second feature from the second video input, and outputs the acquired second feature to the matching unit 1030.
On the other hand, the weighting coefficient calculation unit 1020 calculates probability that a caption is inserted in each block of a frame, using a learning video beforehand. Then, based on the calculated probability, the weighting coefficient calculation unit 1020 calculates a weighting coefficient of each block. Specifically, a weighting coefficient is calculated such that weighting becomes high as the probability of a caption being superposed is low, in order to improve robustness to caption superposition. The acquired weighting coefficient is output to the matching unit 1030.
The matching unit 1030 compares the first feature output from the block unit feature extraction unit 1000 with the second feature output from the block unit feature extraction unit 1010, using the weighting coefficient output from the weighting coefficient calculation unit 1020. Specifically, the matching unit 1030 compares the features of the blocks at the same position in the two frames, and calculates a score of the block unit such that the score is 1 if they are the same, and the score is 0 if they are not the same. The matching unit 1030 sums the acquired scores of the block units by weighting them with use of the weighting coefficients, and calculates a matching score (similarity of a frame unit) of the frame. The matching unit 1030 performs these processes on the respective frames to thereby acquire a matching result between the first video and the second video.
Through these processes, it is possible to perform matching of moving images while reducing influences of caption superposition in portions where the influences may be large, and to achieve high matching accuracy even if caption superposition may be caused.
[Non-Patent Document 1] Kota Iwamoto, Eiji Kasutani, Akio Yamada, “Image Signature Robust to Caption Superimposition for Video Sequence Identification”, Proceedings of International Conference on Image Processing (ICIP2006), 2006
[Non-Patent Document 2] Eiji Kasutani, Ryoma Oami, Akio Yamada, Takami Sato, and Kyoji Hirata, “Video Material Archive System for Efficient Video Editing Based on Media Identification”, Proceedings of international Conference on Multimedia and Expo (ICME2004), pp. 727-730, 2004
Besides the caption superposition described above, there are also causes of lowering the matching accuracy of videos. For example, as scenes fading to black frames commonly appear in various videos, this reduces the matching accuracy of videos. Further, as features cannot be acquired stably in frames only having almost uniform values, such frames also reduce the matching accuracy of videos. As such, if similar (almost identical) video segments which may be caused even in independent videos such as a scene fading to a black frame, and video segments with low reliability in the features such as frames only having almost uniform values are compared in the same manner as other ordinary segments, excessive detection or omission of detection may be caused. This brings a problem of low matching accuracy. Such a problem cannot be solved by the art described in Non-Patent 1 which fails to consider the characteristics of videos themselves which are matching targets.