With recently developed digital technology, multimedia data, such as high-quality video or music, can be generated more easily and quickly than before. Generally, such multimedia data is characterized in that required storage capacity is considerably large and playing time is considerably long. Accordingly, in order to efficiently store, search and read such multimedia data, various technologies have been required, and related research and efforts have been conducted. As a result, the size of such multimedia data can be considerably reduced through international compression standards, such as Moving Picture Experts Group (MPEG) 1, MPEG 2 and MPEG 4 standards, and research into MPEG 7, which allows multimedia data to be efficiently read and searched, is being conducted.
In particular, a technology, which allows video having long playing time to be read at high speed, is referred to as a “video abstract.” A video abstract formed of still images is referred to as a “video summary,” and a video abstract including video and related audio information is referred to as “video skimming.”
Since the video summary uses only still images, the video summary is characterized in that it can be generated faster than the video skimming. On the other hand, the video skimming is characterized in that it can provide more natural screens to a user using audio and textural information.
The video summary is a set of representative still images that represents the contents of video desirably, and the methods thereof are classified according to how to select the representative still images.
A method of extracting the representative still images at regular periods is disadvantageous in that some of the representative still images may be missed because the representative still images are not distributed at regular intervals.
A method of extracting one still image for each shot of video is disadvantageous in that the number of representative still images and temporal distribution are determined by the number of shots and the temporal distribution. That is, an excessively large number of still images or a very small number of still images may be selected according to the number of shots.
Such conventional methods of extracting various feature values from video and nonlinearly extracting representative still images from a feature space are characterized in that calculating time is long or irregular calculating speed according to the variation in the contents of the video.
The conventional methods as described above have problem in that processing time is too long to provide a video summary to a user at high speed, or in that it is difficult to predict the processing time of the video summary.