1. Field of the Invention
The present invention relates to video data compression and indexing and, more particularly, to a semantic compression and indexing of video data.
2. Description of Related Art
Video compression and indexing are crucial in multimedia applications. In recent years, a number of video compression and indexing techniques have been developed. One exemplary technique is a key frame selection technique, which selects key frames as indices for video data. The indices are then used for browsing, searching, retrieval and comparison of video data. Currently, the key frame selection techniques are based on video segmentation, frame clustering, or some hybrid thereof.
In Zhang et al., Video Parsing and Browsing Using Compressed Data, Multimedia Tools and Applications, Vol. 1, pages 89-111 (1995), an exemplary video segmentation technique is disclosed. In this technique, one or more representative key frames are selected for each segmented structural video unit and used as indices for video data.
However, video indexing and summarization methods based on video segmentation are tuned to highly structured and professionally edited commercial products. Typically, these products have camera shots that are rather short (on the order of four seconds), scene changes that are well-defined and frequent (about every 90 seconds or less), and changes in content and cinematography (“montage”) that are visually appealing. These explicit and implicit rules of construction of such products are a great aid in the automated analysis and summary of such videos. For semi-edited or unedited videos like instructional videos, however, segmentation-based key frame selection is no longer appropriate because there are no salient structural units, and because the structural units do not represent meaningful semantic segments.
In Zhuang et al, Adaptive Key Frame Extraction Using Unsupervised Clustering, IEEE International Conference on Image Processing, pages 866-70 (1998), an exemplary video indexing technique based on clustering is disclosed. The clustering techniques avoid segmentation preprocessing; however, most video key frame clustering methods highly depend on thresholds which determine the size of cluster, the number of key frames, or the level of key frames in a key frame hierarchy. Since these thresholds vary greatly among different video genres or even within the same video genre, they are difficult to choose. Furthermore, most clustering-based methods are expensive with respect to time and storage.
Therefore, the key frame selection techniques known hereto, suffer from a common drawback in that they are either tuned to highly structured products or expensive. Accordingly, there remains a need for an inexpensive technique for compressing and indexing semi-edited or unedited video data. There also remains a need for semantically compressing video data at dynamically changing rates so that they would be accessible to a wide variety of platforms and connections, including some whose capacities are severely limited but can dynamically change. Moreover, there remains a need for a video indexing and summarization techniques that are user-tunable, particularly in domains in which there is little formal shot structure and a high amount of frame-to-frame redundancy.