The present invention relates to a signal processing method for detecting and analyzing a pattern reflecting a semantics on which a signal is based, and a video signal processor for detecting and analyzing a visual and/or audio pattern reflecting a semantics on which a video signal is based.
It is often desired to search, for playback, a desired part of a video application composed of a large amount of different video data, such as a television program recorded in a video recorder, for example.
As a typical one of the image extraction techniques to extract a desired visual content, there has been proposed a story board which is a panel formed from a sequence of images defining a main scene in a video application. Namely, a story board is prepared by decomposing a video data into so-called shots and displaying representative images of the respective shots. Most of the image extraction techniques are to automatically detect and extract shots from a video data as disclosed in xe2x80x9cG. Ahanger and T. D. C. Little: A Survey of Technologies for Parsing and Indexing Digital Video, Journal of Visual Communication and Image Representation 7: 28-4, 1996xe2x80x9d, for example.
It should be noted that a typical half-hour TV program for example contains hundreds of shots. Therefore, with the above conventional image extraction technique of G. Ahanger and T. D. C. Little, the user has to examine a story board having listed therein enormous shots having been extracted. Understanding of such a story board will be a great burden to the user. Also, a dialogue scene in which for example two persons are talking will be considered here. In the dialogue, the two persons are alternately shot by a camera each time either of them speaks. Therefore, many of such shots extracted by the conventional image extraction technique are redundant. The shots contain many useless information since they are at too low level as objects from which a video structure is to be extracted. Thus, the conventional image extraction technique cannot be said to be convenient for extraction of such shots by the user.
In addition to the above, further image extraction techniques have been proposed as disclosed in xe2x80x9cA. Merlino, D. Morey and M. Maybury: Broadcast News Navigation Using Story Segmentation, Proceeding of ACM Multimedia 97, 1997xe2x80x9d and the Japanese Unexamined Patent Publication No. 10-136297, for example. However, these techniques can only be used with very professional knowledge of limited genres of contents such as news and football game. These conventional image extraction techniques can assure a good result when directed for such limited genres but will be of no use for other than the limited genres. Such limitation of the techniques to special genres makes it difficult for the technique to easily prevail widely.
Further, there has been proposed a still another image extraction technique as disclosed in the U.S. Pat. No. 5,708,767 for example. It is to extract a so-called story unit. However, this conventional image extraction technique is not any completely automated one and thus a user""s intervention is required to determine which shots have the same content. Also this technique needs a complicated computation for signal processing and is only applicable to video information.
Furthermore, a still another image extraction technique has been proposed as in the Japanese Unexamined Patent Publication No. 9-214879, for example, in which shots are identified by a combination of shot detection and silent period detection. However, this conventional technique can be used only when the silent period corresponds with a boundary between shots.
Moreover, a yet another image extraction technique has been proposed as disclosed in xe2x80x9cH. Aoki, S. Shimotsuji and O. Hori: A Shot Classification Method to Select Effective Key-Frames for Video Browsing, IPSJ Human Interface SIG Notes, 7:43-50, 1996xe2x80x9d and the Japanese Unexamined Patent Publication No. 9-93588 for example, in which repeated similar shots are detected to reduce the redundancy of the depiction in a story board. However, this conventional image extraction technique is only applicable to visual information, not to audio information.
Further, the conventional image extraction techniques can only detect a so-called local video structure and a general video structure which is based on a special knowledge.
Accordingly, the present invention has an object to overcome the above-mentioned drawbacks of the prior art by providing a signal processing method and video signal processor, which can extract a high-level video structure in a variety of video data.
The above object can be attained by providing a signal processing method for detecting and analyzing a pattern reflecting the semantics of the content of a signal, the method including, according to the present invention, steps of: extracting, from a segment consisting of a sequence of consecutive frames forming together the signal, at least one feature which characterizes the properties of the segment; calculating, using the extracted feature, a criterion for measurement of a similarity between a pair of segments for every extracted feature and measuring a similarity between a pair of segments according to the similarity measurement criterion; and detecting, using the feature and similarity determination criterion, a similarity chain consisting of two or more, similar to each other, of the segments.
In the above signal processing method according to the present invention, a basic structure pattern of similar segments in the signal are detected.
Also the above object can be attained by providing a video signal processor for detecting and analyzing a visual and/or audio pattern reflecting the semantics of the content of a supplied video signal, the apparatus including according to the present invention: means for extracting, from a visual and/or audio segment consisting of a sequence of consecutive visual and/or audio frames forming together the video signal, at least one feature which characterizes the properties of the visual and/or audio segment; means for calculating, using the extracted feature, a criterion for measurement of a similarity between a pair of visual segments and/or audio segments for every extracted feature and measuring a similarity between a pair of visual segments and/or audio segments according to the similarity measurement criterion; and means for detecting, using the feature and similarity determination criterion, a similarity chain consisting of two or more, similar to each other, of the visual and/or audio segments.
In the above video signal processor according to the present invention, a basic structure pattern of similar visual and/or audio segments in the video signal are detected.