1. Field of the Invention
The present invention relates to the field of interactively processing video for the purpose of automatically locating specific content. Specifically, the present invention pertains to the field of interactively defining training images and displaying similarity search results.
2. Discussion of the Related Art
Most state-of-the-art systems for video retrieval first segment video into shots, and then create a single keyframe or multiple keyframes for each shot. Video segment retrieval then reduces to image retrieval based on keyframes. More complex conventional systems average color and temporal variation across the query segment, but then perform retrieval based on keyframes in the segmented video. Conventional systems have been designed to find video sequences that exactly match the query, for example instant replays.
There has been much work on still image retrieval by similarity. Retrieval based upon color histogram similarity has been described. Several image similarity measures have been based on wavelet decompositions. Quantizing and truncating higher-order coefficients reduces the dimensionality, while the similarity distance measure is just a count of bitwise similarity. However, this approach has apparently not been used with the discrete cosine transform or the Hadamard transform. All known image retrieval-by-similarity systems require a single image as a query and do not naturally generalize to image groups or classes. Although there has been much work on video queries, much of the literature focuses on query formalisms while presupposing an existing analysis or annotation.
Due to the high cost of video processing, little work has been done on rapid similarity measures. Analysis of individual image frames with a combination of color histogram and pixel-domain template matching has been attempted, though templates must be hand-tailored to the application and so do not generalize. Another distance metric technique is based on statistical properties such as a distance based on the mean and standard deviation of gray levels in regions of the frames.
Other conventional approaches include queries by sketch, perhaps enhanced with motion attributes. As far as using actual video clips as queries, the few reports in the literature include a system in which video “shots” are represented by still images for both query and retrieval, and a system in which video segments are characterized by average color and temporal variation of color histograms. A similar approach involves, after automatically finding shots, they are compared using a color histogram similarity measure. Matching video sequences using the temporal correlation of extremely reduced frame image representations has been attempted. While this can find repeated instances of video shots, for example “instant replays” of sporting events, it is not clear how well it generalizes to video that is not substantially similar. Video similarity has been computed as the Euclidean distance between short windows of frame distances determined by distance of image Eigen projections. This appears to find similar regions in the test video, but may not generalize well as it depends on the video used to calculate the Eigen projection. Video indexing using color histogram matching and image correlation has been attempted, though it is not clear the correlation could be done rapidly enough for most interactive applications. Hidden Markov model video segmentation using motion features has been studied, but does not use image features directly or use for image features for image similarity matching.