It is often desired to search, for playback, a desired part of a video application composed of a large amount of different video data, such as a television program recorded in a video recorder, for example.
As a typical one of the image extraction techniques to extract a desired visual content, there has been proposed a story board which is a panel formed from a sequence of images defining a main scene in a video application. Namely, a story board is prepared by decomposing a video data into so-called shots and displaying representative images of the respective shots. Most of the image extraction techniques are to automatically detect and extract shots from a video data as disclosed in “G. Ahanger and T. D. C. Little: A Survey of Technologies for Parsing and Indexing Digital Video, Journal of Visual Communication and Image Representation 7: 28–4, 1996”, for example.
It should be noted that a typical half-hour TV program for example contains hundreds of shots. Therefore, with the above conventional image extraction technique of G. Ahanger and T. D. C. Little, the user has to examine a story board having listed therein enormous shots having been extracted. Understanding of such a story board will be a great burden to the user. Also, a dialogue scene in which for example two persons are talking will be considered here. In the dialogue, the two persons are alternately shot by a camera each time either of them speaks. Therefore, many of such shots extracted by the conventional image extraction technique are redundant. The shots contain many useless information since they are at too low level as objects from which a video structure is to be extracted. Thus, the conventional image extraction technique cannot be said to be convenient for extraction of such shots by the user.
In addition to the above, further image extraction techniques have been proposed as disclosed in “A. Merlino, D. Morey and M. Maybury: Broadcast News Navigation Using Story Segmentation, Proceeding of ACM Multimedia 97, 1997” and the Japanese Unexamined Patent Publication No. 10-136297, for example. However, these techniques can only be used with very professional knowledge of limited genres of contents such as news and football game. These conventional image extraction techniques can assure a good result when directed for such limited genres but will be of no use for other than the limited genres. Such limitation of the techniques to special genres makes it difficult for the technique to easily prevail widely.
Further, there has been proposed a still another image extraction technique as disclosed in the U.S. Pat. No. 5,708,767 for example. It is to extract a so-called story unit. However, this conventional image extraction technique is not any completely automated one and thus a user's intervention is required to determine which shots have the same content. Also this technique needs a complicated computation for signal processing and is only applicable to video information.
Furthermore, a still another image extraction technique has been proposed as in the Japanese Unexamined Patent Publication No. 9-214879, for example, in which shots are identified by a combination of shot detection and silent period detection. However, this conventional technique can be used only when the silent period corresponds with a boundary between shots.
Moreover, a yet another image extraction technique has been proposed as disclosed in “H. Aoki, S. Shimotsuji and O. Hori: A Shot Classification Method to Select Effective Key-Frames for Video Browsing, IPSJ Human Interface SIG Notes, 7:43–50, 1996” and the Japanese Unexamined Patent Publication No. 9-93588 for example, in which repeated similar shots are detected to reduce the redundancy of the depiction in a story board. However, this conventional image extraction technique is only applicable to visual information, not to audio information.