The present invention relates to a database management system for storing video files electronically and reusing the video files and more particularly to video information retrieval method and apparatus and a recording medium effective in registration and retrieval of the video files coded in the content base by means of a database.
When desired video information is to be retrieved from a database in which video files are stored, it is more effective to retrieve the video information on the basis of not only bibliographical information of the video information such as a heading, a copyright holder and a preparation or photography date thereof but also a definite matter concerning an object appearing in the video information itself.
In a recent database management system, not only retrieval of a still image is made on the basis of keywords associated with the image but also the retrieval is made while an image analysis program is used in combination with the keywords. In such a database management system, when image data is registered therein, distinctive features such as overall coloring or tint, local tint and edge information of the image are previously extracted from the image in the form of image feature vectors by means of the image analysis program. In the retrieval, similarities of image feature vectors are compared, so that it is judged whether the retrieval is successful or not on the basis of the fact that an image having the similarity larger than a threshold coincides with a retrieval condition and an image having the similarity smaller than the threshold does not coincide therewith or retrieved results are arranged in order of the similarities to present a list thereof to a user so that the efficiency of retrieval is improved. Such a database management system is disclosed in, for example, JP-A-7-21198.
The image retrieval technique utilizing the image feature vectors as disclosed in the above publication can be also applied to the video management field. For example, a frame image representative of a video scene, of a series of frame images which are constituent elements of the video scene is extracted as a representative image. The extracted representative images are processed by means of the method similar to the above-mentioned management of the still image to calculate feature vectors of the images, so that it is easily realizable to retrieve a user""s desired video scene through the retrieval of the representative images. The retrieval using the similarities of the representative images can be combined with an indirect retrieval of video performed so far and based on bibliographical information of video and comment information associated with sections of the video to thereby realize the retrieval of more real video images approximately.
Such a conventional video retrieval method by means of the evaluation of similarities using feature vectors of the representative images of the video has difficulties as follows: The video retrieval using the representative images of the video scene pays attention only to still images each produced at a time that the video scene is reproduced actually. Since there is motion in the video unlike the image, the representative images sometimes reproduce a picture different from the video scene that the user imagines in accordance with a selected representative image and the retrieval using the representative images is sometimes unsuccessful. For example, it is considered that a video scene that an object is moved from right to left and disappears in a background is registered in a database. When a representative image of a video scene is accidentally acquired at the time that the object is positioned at the right end, there is a problem that it is difficult to retrieve the video scene if a user designates another position, for example the left end, different from the right end as a position of the object and performs retrieval of the video. Further, when any object appears in a representative image of a video scene accidentally, it is difficult to retrieve the video scene due to impediment by the object in the representative image if the video scene is retrieved on the basis of only a background.
One of coding systems for video files involves a system in which video files are constituted by a plurality of video streams in which the background and the object (hereinafter both referred to as contents) are separated and coded to constitute the respective video streams in order to enhance the compression efficiency and the reusability of the video files. In this coding system, when information of the video files is reproduced, the video streams are combined to be reproduced as one video file. Even when the video files are coded in the content base as described above, there occurs a problem that the retrieval is difficult in the same manner as heretofore depending on a selected representative image if video data is registered by the conventional system on the basis of a reproduced video of combined contents.
It is an object of the present invention to provide a video information retrieval method and apparatus in which a user pays attention to individual contents appearing in a video and can set a retrieval condition to retrieve a video scene in a database system for managing video files coded in the content base.
According to an aspect of the present invention, there is provided a video retrieval method implemented by a computer in order to register video files in a database and retrieve any section of any video file. A computer constituting a video information retrieval system receives a video file including a plurality of video streams coded in each content which is a constituent element thereof. The video file is analyzed and each of a plurality of video streams is extracted as video element objects. Annotation information describable as a coincidence condition in retrieval is extracted for each of the extracted video element objects and the video element objects and the annotation information are registered in the database. When the user is to retrieve a video, coincidence with a retrieval condition in each video element object unit is judged on the basis of the retrieval condition for each of constituent elements of the video designated by the user. Set operation relative to appearance time sections of the video element objects coincident with the video retrieval condition is performed to thereby define a video scene constituting a retrieved result and present it to the user. According to a preferred aspect of the present invention, a video file including a video scene designated by the user is acquired from the database in response to a user""s designation to the video scene and is presented to the user. According to the present invention, a background video and a subject video of a video scene are extracted separately to be recorded in the database and when the user retrieves a desired video scene, the background video and the subject video recorded in the database are retrieved separately to obtain a video scene near to the user""s retrieval image from two kinds of retrieved information.
According to another aspect of the present invention, there is provided a retrieval apparatus for registering video files in the database and retrieve any section of any video file. The retrieval apparatus comprises analysis means for stream-analyzing a video file including a plurality of video streams coded in each of contents which are constituent elements and combined upon reproduction to be displayed and extracting each of the plurality of video streams as video element objects, extraction means for extracting annotation information describable as a coincidence condition in retrieval for each of the video element objects extracted by the analysis means, registration means for registering the video element object information and the annotation information in the database, retrieval means for judging coincidence in each video element object unit on the basis of a retrieval condition for each of constituent elements of a video designated by a user, and output means for performing set operation relative to appearance time sections of the coincident video element objects to define a video scene constituting a retrieved result and present it to the user.
In a preferable aspect of the present invention, a reference image referred upon retrieval is extracted from the video objects in the registration processing and an image feature vector representative of a feature of the reference image is acquired from the reference image. The reference image and the image feature vector are stored in relation to the video object. The reference image is extracted by making comparison of images at regular intervals with respect to video streams constituting the video objects and extracting, when images having large similarity are continued, an image at the head thereof as the reference image. The image feature vector includes analyzed results for overall tint, local tint and edge information of the reference image.
The above and other objects and novel features of the present invention will be apparent from the following description of the specification and the accompanying drawings.