As a similar segment retrieval technique for retrieving (discriminating) a time segment having similar features from a series of feature vectors of respective frames indicating time-series data such as moving image data and acoustic data, there has been known a method of specifying similar time segments by sequentially performing matching on two sets of feature vector series to be compared (calculating a similarity or a distance) in frame units. For example, Non-Patent Document 1 describes that a distance calculation is performed in frame units using Color Layout Descriptor, defined in ISO/IEC 15938-3, as a feature vector of each frame, to thereby discriminate a similar segment.
The method of performing matching between the feature vector series to be compared requires a long period of time for retrieval. As such, in order to speed up the retrieval, another method has been proposed, in which a feature vector representing a time segment (referred to as a time segment representative feature vector) is generated for each time segment including a plurality of frames, and matching is performed using the generated time segment representative feature vectors, rather than performing matching in frame units.
For example, Non-Patent Document 2 describes generating a histogram feature from feature vectors included in a time segment as a time segment representative feature vector. Specifically, as a feature vector for each frame of a moving image, a frame image is divided into a plurality of sub images, and color component values (R component, G component, and B component) of each of the sub-images are used as the features thereof. The feature vectors of the frames included in a time segment are quantized, and a time segment representative feature vector is generated as a histogram indicating the appearance frequency of the respective quantization indexes.
Non-Patent Document 3 and Non-Patent Document 4 describe that as a time segment representative feature vector, a key frame within a time segment is selected and a feature vector of the selected key frame is directly used as the time segment representative feature vector. In these documents, a shot of a moving image is used as a time segment, and a key frame is selected from the shot, and a feature vector thereof is used as a time segment representative feature vector.
Non-Patent Document 5 describes that from feature vectors of a plurality of frames included in a time segment, mean values or median values for respective dimensions of the feature vectors are calculated, and a feature vector constituted of the calculated mean values or the median values is used as a time segment representative feature vector.    Non-Patent Document 1: Eiji Kasutani, Ryoma Oami, Akio Yamada, Takami Sato, and Kyoji Hirata, “Video Material Archive System for Efficient Video Editing based on Media Identification”, Proc. on ICME (International Conference on Multimedia and Expo) 2004, Vol. 1, pp. 727-730, June 2004.    Non-Patent Document 2: Kunio Kashino, Takayuki Kurozumi, Hiroshi Murase, “A Quick Search Method for Audio and Video Signals Based on Histogram Pruning”, IEEE Transactions on Multimedia, Vol. 5, No. 3, September 2003.    Non-Patent Document 3: Anil Jain, Aditya Vailaya, and Wei Xiong, “Query by Video Clip”, Proc. on ICPR (International Conference on Pattern Recognition), Vol. 1, pp. 16-20, August 1998.    Non-Patent Document 4: Yusuke Uchida, Masaru Sugano, Akio Yoneyama, “A Study on Content Based Copy Detection Using Color Layout”, Proc. on IMPS (Image Media Processing Symposium) 2008, Proceedings, pp. 69-70, October 2008.    Non-Patent Document 5: Eiji Kasutani, Akio Yamada, “Acceleration of Video Identification Process Using Group-of-Frame Feature”, Proc. on FIT (Forum on Information Technology) 2003, pp. 85-86, 2003.