The present invention relates to a method and apparatus for retrieving a broadcast video or a video included in a data base, and in particular to a video retrieval method and apparatus capable of retrieving a video at high speed by using a feature of a video as a clue.
In multimedia information processing systems of recent years, it is possible to store various kinds of information such as videos and texts and display them to users. In case they are to be retrieved, however, there is no alternative but to retrieve them by using languages such as keywords. In this case, work for providing keywords is required. The task of providing frames of a video with respective keywords requires much hard labor. Furthermore, since keywords are freely provided by the data base constructor, there is a problem that these keywords become useless in case the viewpoint of the user is different from that of the data base constructor. Furthermore, in case of videos, there is a need for retrieval based upon a feature of an image itself besides a keyword. Retrieval using a feature of an image as a clue needs a technique capable of fast matching a feature of a video including an enormous number of frames with features of its enquiry video. Heretofore, there has never been a matching technique which can be applied to videos. In conventional video retrieval, therefore, there is no practical means except a method in which the user checks and looks for a video with eyes while reproducing videos together with functions of rapid traverse and rewind. Even if videos are digitized and stored to form a data base, therefore, efficient use is impossible. Furthermore, there exists no video retrieval system catching a specific scene out of a video which is being broadcasted.
An object of the present invention is to provide a video retrieval method, and apparatus, capable of fast matching a feature of a target video with a feature of an enquiry video without conducting work of providing keywords for video retrieval. As for the target video, both a video which is being broadcasted and a video included in a data base should be dealt with.
In order to achieve the above described object, a retrieval method according to the present invention includes the steps of providing a representative frame image specified by a user with a code or a code string as a name thereof, and registering beforehand code strings of a plurality of representative frame images as enquiry video names; inputting a target video; extracting a representative frame image from the video; calculating one feature or a plurality of features from a whole or a part of the frame image; assigning a corresponding code to the feature or each of the features; providing the frame image with the code or a code string arranged in a predetermined order as a name thereof, and producing a video name for the video in a form of a frame name string corresponding to a string of the frames; and matching the enquiry video names with the video name by using the frame name string and outputting a result.
As the representative frame image, a head frame image located at a change of scenes or a frame image inputted at fixed intervals is adopted. The feature is calculated from a digitized area or a time length of a scene change between frames forming a frame string. For the code assignment, a feature is divided into predetermined ranges and a code to be assigned is determined on the basis of which range the feature belongs to. As for codes, codes such as existing alphabetic characters or Chinese characters are assigned. Furthermore, if the feature is located near a boundary of a divided range at the time of assignment, a supplementary code is added.
On the other hand, the enquiry video name is also produced from the code of a representative frame string. In response to specification of a pertinent frame image by the user, the enquiry video name is produced semiautomatically. In case there are a plurality of enquiry videos, an attribute name is added to each of said enquiry video names.
Video name matching is conducted when a representative frame image has been extracted. Furthermore, the matching is conducted only when a time length between representative frame images is within a predetermined range. Actual video name matching is conducted by comparing code strings. In case there is a supplementary code, the comparison is made while considering a possibility of a different code meant by the supplementary code.
Finally, the output result of matching includes at least one of time information, a video name and a video attribute name, on the successful matching.
In case the retrieval target is a video which is being broadcasted and a video to be matched is a commercial video including a plurality of scenes, the output of matching is at least one of broadcast time, a commercial name, and a sponsor name.
In case the retrieval target is a video in a data base, a representative frame image is extracted from a video when the video is stored on a storage medium. One feature or a plurality of features are calculated from a whole or a part of the frame. A corresponding code is assigned to the feature or each of the features. The frame image is provided with the code or a code string arranged in a predetermined order as a name thereof, and a video name for the video is produced in a form of a frame name string corresponding to a string of the frame. The video name is stored as index information, and the index information of the storage medium is matched with a string of names of videos prepared beforehand.
The apparatus for implementing the retrieval method heretofore described includes video inputting means, means for extracting a representative frame image of the video, means for calculating one feature or a plurality of features from a whole or a part of the frame image, means for assigning a corresponding code to the feature or each of the features, means for providing the frame image with the code or a code string arranged in a predetermined order as a name thereof, and producing a video name for the video in a form of a frame name string corresponding to a string of the frame, and means for matching the video name with the enquiry video names.
According to the above described method, representative frame images of the present invention are limited to head frame images at scene changes or frames inputted at predetermined intervals. Therefore, it is possible to prevent occurrence of a video name having a large number of characters including similar code strings. This results in an effect that the matching time can be shortened. Furthermore, matching based upon a name derived from the feature of the video is performed, and vide o retrieval is conducted as simple character string matching like text retrieval, resulting in high speed. Typically in conventional video matching, there is no alternative but to determine the degree of similarity on the basis of an evaluation function based upon numerical computation between features, and hence a long calculation time is required. In the present invention, that calculation becomes unnecessary at the time of matching. In principle, therefore, the present invention method is faster than the conventional method. Furthermore, the feature is a simple one calculated from a digitized image or time length between frames, and hence it can be calculated in real time. Code assignment is a simple one conducted by only assigning a code on the basis of which range the feature belongs to. The time required for assignment is thus short. As for codes, by assigning existing character codes, a general purpose mechanism for character string matching can be used. It thus becomes unnecessary to newly develop a special matching mechanism for video retrieval.
In case a feature has a value located near an adjacent code at the time of code assignment, a supplementary code is added to expand a permissible range at the time of matching. Therefore, deterioration of performance such as retrieval omission can be prevented.
On the other hand, in case there are a plurality of enquiry video names, attribute names of videos are added. By seeing the attribute name of a video, the user can easily know which of a plurality of enquiry videos has matched.
Video name matching is conducted only when a representative frame image has been extracted. And video name matching is conducted only when the time length between the frames is in a predetermined range. This results in an effect that the number of times of matching can be reduced.
The output result of matching includes at least one of time information, a video name and a video attribute name, on the successful matching. The user can easily collect and arrange video retrieval results afterwards.
In case the retrieval target is a video which is being broadcasted and a video to be matched is a commercial video including a plurality of scenes, at least one of broadcast time, a commercial name, and a sponsor name is outputted as the retrieval result. Therefore, statistical information linked with the number of times of broadcast of commercial messages for each kind and audience ratings at the time of broadcast can be automatically derived. In case the retrieval target is a video in a data base, calculation of the feature can be omitted at the time of retrieval by adding a frame name as index information beforehand and hence faster matching can be implemented.
An apparatus for implementing the retrieval method heretofore described includes video inputting means, means for extracting a representative frame, means for calculating a feature, means for assigning a code, means for producing a video name, and means for matching video names. Real time processing thereof can be implemented in general purpose work stations having video input function. Inexpensive video retrieval apparatuses can thus be implemented.